What to do About the Recent .Net Update Crashing Servers
If you have Microsoft servers running the Azure Connect agent to sync your active directory user accounts, you may have noticed some issues over the last week. A recent .Net Framework update has caused the Azure Sync agent to have a memory leak. As the amount of memory used by this service grows, there is less and less memory available for other services, programs and even logging into the system to shut it down properly.
For some, the only way to get a physical server up and running again is to force the server off and reboot. Upon reboot, there are some steps you need to take to prevent this from happening again until Microsoft releases another .Net update and a new Azure Connect agent.
What to do About the Recent .Net Update Crashing Servers
What is actually happening
A published .Net Framework update caused an issue with one of the services that are part of the Microsoft Azure Connect sync agent. The update triggered a memory leak issue with the Azure Connect agent. Over time, the service can cause the server to lock up and fail to respond. The inability of the server to respond is caused by the service continually requesting more memory and not leaving enough for other necessary services to run.
When a server runs out of memory to allocate, a local swap file is used. This causes services to respond sluggishly as programs normally run in memory are being written to disk and must be swapped back into memory as requests are received. Eventually the server is forced to start disabling basic services in an effort to prevent itself from crashing. Once this happens, things like logging into the server to shut it down properly are often not possible.
For troubleshooting purposes, the problematic service is called Azure AD Connect Health Sync Monitor. It is run by the executable named Microsoft.Identity.Health.AadSync.MonitoringAgent.Startup.exe.
Luckily, Microsoft is aware of the issue and is working on an updated version of both the .Net Framework service as well as the Azure Connect sync agent. Once these are ready, they will be released and most servers should receive these updates automatically. However, in the meantime, it is important to implement a temporary fix. A server that continually locks up can lose files, suffer from long term file damage or even corrupted boot files, all of which will cause bigger issues.
To work around the issue until replacement updates are released, Microsoft suggests removing the offending KB update and rebooting the server. The exact update to remove depends on the local server operating system. Before removing any updates, first check to find out which update you need to remove based on your unique situation. Once you have located the KB update to remove, remove it and then reboot the server. NOTE: Always be sure you have a reliable and up-to-date backup before performing such tasks as part of your business continuity plan.
Updates are inevitable. Sometimes they address security issues, sometimes they add features, and sometimes they have unintended results. The wide variety of differences among server operating systems, proprietary software and updates applied make it challenging for every update to work seamlessly with every system. This means it is important to know what to do to work around issues as they arise. While this fix is only temporary, it should solve the memory leak issue until new updates are released.
As always, there will be unexpected obstacles when it comes to tech. Knowing how to work around them is key!