What Occurred?
Safeguard Remote Access (SRA) sessions were closing unexpectedly in the EU region.
What went wrong and why?
SRA RDP/SSH sessions in the EU were disconnecting when Azure Kubernetes nodes began running out of WebSockets. In certain scenarios, unused sockets were not being closed correctly resulting in WebSocket exhaustion and subsequent disconnections.
How are we making incidents like this less likely or less impactful?
We are implementing a more robust WebSocket management solution and increasing SRA's logging to proactively identify and prevent this from recurring. These improvements will increase reliability and mitigate against future occurrences in addition to improving future troubleshooting across the platform.