Umbraco and .NET Development - Umbraco Silver Partner

A few weeks ago, I was contacted by Umbraco support regarding an issue with a website I look after for a client that runs on Umbraco Cloud. The website is running Umbraco 7 on the Starter Plan and was regularly running over the CPU quota with the CPU spiking multiple times throughout the day with no apparent reason.

Umbraco Cloud CPU Spikes

The website itself is straight-forward with pages of content and nothing overly complicated. It runs a couple of scheduled tasks throughout the day, but they were not doing anything hugely demanding so I was sceptical these were causing the CPU spikes.

If I was running this on my own server, it would be a lot easier to try and diagnose as I would have full access to the operating system could run various tools to find the source of the problem, but my options were limited as Umbraco Cloud hides away much of the hosting implementation leaving me scratching my head to try and work out how I could investigate this further.

There are no monitoring tools on Umbraco Cloud beyond the performance and availability graphs and no way of installing any Application Performance Monitoring (APM) software such as New Relic or any other third-party tools that I am aware of, so I had to look at alternative solutions to help identify and diagnose the problem.

The performance and availability graphs on the Umbraco Cloud dashboard clearly showed regular CPU spikes throughout the day but I had no way of knowing what was causing them.

My first thought was to log all requests and responses to the website so I could see if there were any long running requests that may correlate to the CPU spikes that Umbraco Cloud was reporting. I added some code to the application request and response handlers to log each request, along with the URL, IP address and the timing which would be written to the Umbraco log files. Running this for a day showed no long running requests, apart from the occasional slow page load but nothing that was the cause of the CPU spikes.  I could see lots of 404 failed requests to files which did not exist, including some media files but according to my timings in the logs these requests were very quick so unless the volume was causing undue load, I was still none the wiser to the cause of the problem.

public void Application_BeginRequest(Object sender, EventArgs e)
{
    var httpApplication = (HttpApplication)sender;
    var request = new HttpRequestWrapper(httpApplication.Request);

    var timer = new Stopwatch();
    var guid = Guid.NewGuid();

    httpApplication.Context.Items["Timer"] = timer;
    httpApplication.Context.Items["Guid"] = guid;

    timer.Start();

    LogHelper.Debug(this.GetType(), $"[{guid}] Begin request for {request.RawUrl} from IP address {request.GetUserIPAddress()}");
}

public void Application_EndRequest(Object sender, EventArgs e)
{
    var httpApplication = (HttpApplication)sender;
    var request = new HttpRequestWrapper(httpApplication.Request);
    var response = new HttpResponseWrapper(httpApplication.Response);

    var timer = (Stopwatch)httpApplication.Context.Items["Timer"];
    var guid = (Guid?)httpApplication.Context.Items["Guid"];

    if (timer != null)
    {
        timer.Stop();

        LogHelper.Debug(this.GetType(), $"[{guid}] End request for {request.RawUrl} from IP address {request.GetUserIPAddress()} took {timer.ElapsedMilliseconds}ms ({response.StatusCode})");
    }
    else
    {
        LogHelper.Debug(this.GetType(), $"End request for {request.RawUrl} from IP address {request.GetUserIPAddress()} ({response.StatusCode})");
    }

    httpApplication.Context.Items.Remove("Timer");
    httpApplication.Context.Items.Remove("Guid");
}

Logging requests to the Umbraco log with timings

I needed to get a bigger picture of the application and its performance, not just the requests. So, the next option was to try and install Application Insights as I was hoping I would get some more detailed telemetry due to Umbraco Cloud running on Azure Wep Apps.

I followed the instructions here to install Application Insights and created an Applications Insights resource on Azure outside of Umbraco Cloud.

https://learn.microsoft.com/en-us/azure/azure-monitor/app/asp-net

(If you are using Umbraco 9+ you can follow the instructions here to add Application Insights: https://docs.umbraco.com/umbraco-cloud/set-up/application-insights)

Once I added the Application Insights connection string and re-deployed the website, I could see data starting to appear in Application Insights, so I left it running for a couple of days to allow it to collect some data.

Looking at Application insights I could see the overall performance of the website was good, most requests were almost instant due to the website caching partials and apart from the occasional slow page load for pages visited less frequently there was nothing that correlated to the CPU spikes that I was seeing on Umbraco Cloud.

I was hoping to be able to enable the profiler but as I did not have access to the underlying Web App on Azure, it would not allow me to enable to profiler. This was a brick wall, I needed to be able to see a detailed graph of the CPU and preferably be able to capture the call stack when the CPU spiked but I was at a loss at this point.

That is when I had a breakthrough, one of the graphs on Application Insights allows you to see a report by role, which is the server that was sending the telemetry and luckily as the roles were Web Apps, I could see a CPU and memory graph for both the test site and the live site.

What I could see on the graph was the same spike that matched the graphs on Umbraco Cloud but more importantly, I could also see the memory graph side-by-side. This showed that whenever the CPU spiked the memory dropped, meaning that the CPU spike was being caused by the .NET Garbage Collection (GC) which could be due to a memory leak.

Application Insights Role Graph Showing CPU Spike And Memory Drop

At least this had narrowed it down even though I did not know what the exact cause was, now for some trial-and-error investigating! I had a feeling that this could be due to the substantial number of media items on the website which were stored on the client’s own Azure account outside of Umbraco Cloud.  Some of these images were quite large and even though they had been dynamically resized and cached by Image Processor, the original images were quite large and would need to be loaded into memory to resize and cache them.

My first thought was that this could be due to some of the Image Processor packages not being on the latest version, so I updated these along with the log4net package just to make sure logging was not causing any memory leaks. I also spotted that the website was running under .NET Framework 4.5.2 which is the default for Umbraco 7, but I know there were improvements to GC in .NET 4.6.2 so I also updated the .NET Framework version to 4.7.2 just to see if that would help. I also updated the website to Umbraco 7.15.7 to 7.15.10 as for some reason it had not auto updated on Umbraco Cloud.

After deployment, the results were immediate. I could see a considerable drop in the CPU spikes and a reduction in their frequency. I was still seeing some CPU spikes, but they were much smaller and to be honest we could live with them. I do not know whether it was the package updates, the .NET Framework version or the Umbraco update that made the difference (or even a combination of all three) but at least it was not the application code or anything fundamentally wrong with the website.

I was still curious about the remaining CPU spikes, and I had noticed that Umbraco Cloud had set the trim cache option to true in the Image Processor cache.config file even though it was set to false locally. The trim cache option deletes old images from the Image Processor cache when images are requested, so in theory this could have an overhead due to the large volume of media items. Using Kudu on Umbraco Cloud, I edited the cache.config and set trimCache to false and restarted the website.

<?xml version="1.0" encoding="utf-8"?>
<caching currentCache="DiskCache">
  <caches>
    <cache trimCache="false" name="DiskCache" type="ImageProcessor.Web.Caching.DiskCache, ImageProcessor.Web" maxDays="365" memoryMaxMinutes="1" browserMaxDays="7">
      <settings>
        <setting key="VirtualCachePath" value="~/../wwwcache/IPCache" />
      </settings>
    </cache>
  </caches>
</caching>

The Image Processor cache.config showing trimCache set to false

After leaving it running like that for a day, there were no more CPU spikes, and the graphs were showing a consistent CPU load throughout the day. Turning off the trim cache does mean the cache would build up over time, so you need to consider carefully what is more important, the number of images in the cache taking up storage space or the CPU usage.

Umbraco Cloud CPU After Updates