Troubleshoot Performance Bottlenecks in .NET 6 Applications — SitePoint

Performance issues can creep up when you least expect them. This can have negative consequences for your customers. As the user base grows, your app may lag because it is unable to meet the demand. Fortunately, there are tools and techniques available to tackle these issues in a timely manner.

We have created this article in collaboration with Site24x7. Thank you for supporting the partners who make SitePoint possible.

In this version, I will investigate performance bottlenecks in a .NET 6 application. The focus will be on a performance issue I have personally seen in production. The intent is for you to be able to reproduce the issue in your local development environment and address the issue.

Feel free to download the sample code from GitHub or follow along. The solution has two APIs, unimaginatively named First.Api and Second.Api. The first API calls into the second API to get weather data. This is a common use because APIs can call into other APIs so that data sources remain decoupled and can scale individually.

First, make sure you have the .NET 6 SDK installed on your machine. Next, open a terminal or console window:

> dotnet new webapi --name First.Api --use-minimal-apis --no-https --no-openapi
> dotnet new webapi --name Second.Api --use-minimal-apis --no-https --no-openapi

The above can go in a solution folder as performance-bottleneck-net6. This creates two web projects with minimal APIs, no HTTPS and no swagger or Open API. The tool scaffolds the directory structure, so please refer to the example code if you need help setting up these two new projects.

The solution file can be located in the solution folder. This allows you to open the entire solution via an IDE such as Rider or Visual Studio:

dotnet new sln --name Performance.Bottleneck.Net6
dotnet sln add First.Api\First.Api.csproj
dotnet sln add Second.Api\Second.Api.csproj

Next, make sure to set the port numbers for each web project. In the example code I have set them to 5060 for the first API and 5176 for the second. The specific number doesn’t matter, but I will use these to reference the APIs through the sample code. So make sure you either change your port numbers or keep what the scaffolding generates and stay consistent.

The offending application

open Program.cs file in the second API and insert the code that reacts with weather data:

var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();
var summaries = new[]
{
 "Freezing", "Bracing", "Chilly", "Cool", "Mild", "Warm", "Balmy", "Hot", "Sweltering", "Scorching"
};

app.MapGet("/weatherForecast", async () =>
{
 await Task.Delay(10);
 return Enumerable
   .Range(0, 1000)
   .Select(index =>
     new WeatherForecast
     (
       DateTime.Now.AddDays(index),
       Random.Shared.Next(-20, 55),
       summaries[Random.Shared.Next(summaries.Length)]
     )
   )
   .ToArray()[..5];
});

app.Run();

public record WeatherForecast(
 DateTime Date,
 int TemperatureC,
 string? Summary)
{
 public int TemperatureF => 32 + (int)(TemperatureC / 0.5556);
}

The minimal API feature in .NET 6 helps keep code small and concise. This will go through a thousand records and it performs a task delay to simulate asynchronous data processing. In a real project, this code can be called into a distributed cache or database, which is an IO-bound operation.

Now go to Program.cs file in the first API and write the code that uses this weather data. You can simply copy and paste this and replace what the scaffolding generates:

var builder = WebApplication.CreateBuilder(args);
builder.Services.AddSingleton(_ => new HttpClient(
 new SocketsHttpHandler
 {
   PooledConnectionLifetime = TimeSpan.FromMinutes(5)
 })
{
 BaseAddress = new Uri("http://localhost:5176")
});

var app = builder.Build();

app.MapGet("https://www.sitepoint.com/", async (HttpClient client) =>
{
 var result = new List<List<WeatherForecast>?>();

 for (var i = 0; i < 100; i++)
 {
   result.Add(
     await client.GetFromJsonAsync<List<WeatherForecast>>(
       "/weatherForecast"));
 }

 return result[Random.Shared.Next(0, 100)];
});

app.Run();

public record WeatherForecast(
 DateTime Date,
 int TemperatureC,
 string? Summary)
{
 public int TemperatureF => 32 + (int)(TemperatureC / 0.5556);
}

That HttpClient is injected as a singleton because this makes the client scalable. In .NET, a new client creates sockets in the underlying operating system, so a good technique is to reuse those connections by reusing the class. Here the HTTP client sets a lifetime for the connection pool. This allows the client to hang on to outlets as long as needed.

A base address just tells the client where to go, so make sure it points to the correct port number specified in the other API.

When a request comes in, the code loops a hundred times and then calls into the other API. This is to simulate e.g. a number of registrations needed to make calls to other APIs. The iterations are hard-coded, but in a real project this can be a list of users that can grow indefinitely as the business grows.

Now focus your attention on looping because this has implications in performance theory. In an algorithmic analysis, a single loop has a Big-O linear complexity or O(n). But the second API also loops, which spikes the algorithm to a quadratic or O(n^2) complexity. The loop also passes through an IO boundary to boot, which disrupts performance.

This has a multiplicative effect because for every iteration in the first API, the second API runs a thousand times. There are 100 * 1000 iterations. Remember that these lists are unbound, which means performance will degrade exponentially as the datasets grow.

When angry customers spam the call center and demand a better user experience, use these tools to try to figure out what’s going on.

CURL and NBomber

The first tool helps identify which API to focus on. When optimizing code, it is possible to optimize everything ad infinitum, so avoid premature optimizations. The goal is to get performance to be “just good enough” and this tends to be subjective and driven by business requirements.

First call each API individually using CURL, for example to get a feel for the latency:

> curl -i -o /dev/null -s -w %{time_total} http://localhost:5060
> curl -i -o /dev/null -s -w %{time_total} http://localhost:5176

The port number 5060 belongs to the first API and 5176 belongs to the second. Check if these are the correct ports on your machine.

The second API responds in fractions of a second, which is good enough and probably not the culprit. But the first API takes almost two seconds to respond. This is unacceptable because web servers can time out requests that take this long. Also, a two-second delay is too slow from the customer’s perspective because it’s a disruptive delay.

Next, a tool like NBomber will help benchmark the problematic API.

Go back to the console and create a test project inside the root folder:

dotnet new console -n NBomber.Tests
cd NBomber.Tests
dotnet add package NBomber
dotnet add package NBomber.Http
cd ..
dotnet sln add NBomber.Tests\NBomber.Tests.csproj

In it Program.cs file, write benchmarks:

using NBomber.Contracts;
using NBomber.CSharp;
using NBomber.Plugins.Http.CSharp;

var step = Step.Create(
 "fetch_first_api",
 clientFactory: HttpClientFactory.Create(),
 execute: async context =>
 {
   var request = Http
     .CreateRequest("GET", "http://localhost:5060/")
     .WithHeader("Accept", "application/json");
   var response = await Http.Send(request, context);

   return response.StatusCode == 200
     ? Response.Ok(
       statusCode: response.StatusCode,
       sizeBytes: response.SizeBytes)
     : Response.Fail();
 });

var scenario = ScenarioBuilder
 .CreateScenario("first_http", step)
 .WithWarmUpDuration(TimeSpan.FromSeconds(5))
 .WithLoadSimulations(
   Simulation.InjectPerSec(rate: 1, during: TimeSpan.FromSeconds(5)),
   Simulation.InjectPerSec(rate: 2, during: TimeSpan.FromSeconds(10)),
   Simulation.InjectPerSec(rate: 3, during: TimeSpan.FromSeconds(15))
 );

NBomberRunner
.RegisterScenarios(scenario)
.Run();

The NBomber only spams the API at a rate of one request per request. second. Then at intervals of twice a second for the next ten seconds. Finally, three times a second for the next 15 seconds. This prevents the local dev machine from being overloaded with too many requests. The NBomber also uses network sockets, so tread carefully when both the target API and the benchmark tool are running on the same machine.

The test step traces the response code and sets it in the return value. This keeps track of API errors. In .NET, when the Kestrel server gets too many requests, it rejects them with an error response.

Now inspect the results and check for latencies, concurrent requests and throughput.

Offensive app results

The P95 delays show 1.5 seconds, which is what most customers will experience. Throughput remains low because the tool was calibrated to only go up to three requests per second. In a local developer machine, concurrency is hard to figure out because the same resources running the benchmark tool are also needed to service requests.

dotTrace Analysis

Then choose a tool that can perform an algorithmic analysis like dotTrace. This will help further isolate where the performance issue may be.

To do an analysis, run dotTrace and take a snapshot after NBomber spams the API as hard as possible. The goal is to simulate a heavy load to identify where the slowness is coming from. The benchmarks already in place are good enough, so make sure you run dotTrace together with NBomber.

dotTrace analysis

Based on this analysis, about 85% of the time is spent on GetFromJsonAsync call. Looking around the tool reveals that this is coming from the HTTP client. This correlates with performance theory because this shows that asynchronous looping with O(n^2) complexity could be the problem.

A benchmark tool running locally will help identify bottlenecks. The next step is to use a monitoring tool that can track requests in a live production environment.

Performance studies are about gathering information, and they cross-check that each tool at least tells a coherent story.

Site 24×7 monitoring

A tool like Site24x7 can help tackle performance issues.

For this application, you will focus on the P95 delays in the two APIs. This is the ripple effect because the APIs are part of a series of interconnected services in a distributed architecture. When an API begins to experience performance issues, other APIs downstream may experience issues as well.

Scalability is another crucial factor. As the user base grows, the app may begin to lag. It helps track normal behavior and predict how the app will scale over time. The nested async loop found on this app may work well for N number of users, but may not scale because the number is unbounded.

Finally, when implementing optimizations and enhancements, tracking version dependencies is key. At each iteration, you should be able to know which version is better or worse for performance.

A proper monitoring tool is necessary because problems are not always easy to spot in a local development environment. The assumptions made locally may not be valid in production because your customers may have a different opinion. Start your 30-day free trial at Site24x7.

A more efficient solution

With the arsenal of tools so far, it’s time to explore a better approach.

CURL said that the first API is the one that has performance issues. This means that any improvements to the second API are negligible. Although there is a ripple effect here, shaving a few milliseconds from the other API won’t make much of a difference.

NBomber confirmed this story by showing that the P95s were at nearly two seconds in the first API. Then dotTrace singled out the async loop because that’s where the algorithm spent most of its time. A monitoring tool like Site24x7 would have provided supporting information by showing P95 delays, scalability over time and versioning. Probably the specific version that introduced the nested loop would have increased latencies.

According to performance theory, quadratic complexity is a major concern because performance degrades exponentially. A good technique is to squash the complexity by reducing the number of iterations inside the loop.

A limitation in .NET is that every time you see a wait, the logic blocks and only sends one request at a time. This stops the iteration and waits for the other API to return a response. It’s sad news for the show.

A naive approach is to simply break the loop by sending all HTTP requests at the same time:

app.MapGet("https://www.sitepoint.com/", async (HttpClient client) =>
 (await Task.WhenAll( 
   Enumerable
     .Range(0, 100)
     .Select(_ =>
       client.GetFromJsonAsync<List<WeatherForecast>>( 
         "/weatherForecast")
     )
   )
 )
 .ToArray()[Random.Shared.Next(0, 100)]);

This will nuke the wait inside the loop and blocks only once. That Task.WhenAll sends everything in parallel, which smashes the loop.

This approach might work, but it risks spamming the other API with too many requests at once. The web server may reject requests because it thinks it may be a DoS attack. A far more sustainable approach is to cut down on repetitions by sending only a few at a time:

var sem = new SemaphoreSlim(10); 

app.MapGet("https://www.sitepoint.com/", async (HttpClient client) =>
 (await Task.WhenAll(
   Enumerable
     .Range(0, 100)
     .Select(async _ =>
     {
       try
       {
         await sem.WaitAsync(); 
         return await client.GetFromJsonAsync<List<WeatherForecast>>(
           "/weatherForecast");
       }
       finally
       {
         sem.Release();
       }
     })
   )
 )
 .ToArray()[Random.Shared.Next(0, 100)]);

This works much like a bouncer in a club. The maximum capacity is ten. When requests enter the pool, only ten can participate at one time. This also allows concurrent requests, so if one request leaves the pool, another can immediately enter without having to wait for ten requests.

This reduces the algorithmic complexity by a factor of ten and eases the pressure from all the crazy looping.

With this code in place, run NBomber and check the results.

A more efficient solution

P95 latencies are now a third of what they used to be. A half-second response is far more reasonable than something that takes over a second. Of course, you can keep optimizing this further, but I think your customers will be pretty happy about that.

Conclusion

Performance optimizations are a never-ending story. As the business grows, the assumptions first made in the code may become invalid over time. Therefore, you need tools to analyze, draw benchmarks, and continuously monitor the app to help mitigate performance issues.

William

Leave a Reply

Your email address will not be published. Required fields are marked *