Azure App Service performance

Azure App Service performance

Your Azure App Service web application is experiencing degraded performance. Pages that used to load quickly are now slow, and users might see timeout errors or generally sluggish responses. Key symptoms can include: – High response times: Requests that normally took a fraction of a second are now taking several seconds or longer, potentially exceeding client timeouts (e.g., 30 seconds). Sometimes requests even fail with HTTP 502/503 due to the backend not responding timely. – Throughput drop and resource saturation: The app’s throughput (requests per second) is lower than usual, and monitoring shows the App Service plan’s CPU or memory usage hitting 80-100%. Despite heavy load, instances are not scaling out sufficiently – possibly leading to queued requests. – Cold start delays: After periods of inactivity or after deployment, the first few requests are extremely slow to respond. This suggests the app is experiencing cold starts (initialization overhead), where loading the application into memory causes noticeable latency for first-time users. – Dependency bottlenecks: The application’s performance is constrained by slow calls to external dependencies (databases, APIs, third-party services). If one of these dependencies becomes slow or hits capacity, it drags down the overall response time of the app. This often manifests as requests spending most of their time waiting on an external call.

Possible Root Causes:

  • Insufficient scale or compute resources: The application is under-provisioned for the current load. Perhaps it’s running on too few instances or on a pricing tier that doesn’t provide enough CPU/memory. When usage spikes, the app instances max out their resources, causing request processing to slow down significantly. If auto-scale is not enabled or has reached its limit, the service can’t add more instances to handle the load, thus degrading performance[33][44]. High CPU can also occur due to inefficient code but if it correlates with increased traffic, it’s often a scaling issue.
  • Cold start overhead: If Always On is disabled or if the app is running on a consumption-based plan, the app may unload after a period of inactivity. The next user request has to “cold start” the app (load code, JIT compile, etc.), introducing delays of several seconds or more. Cold starts are particularly significant for Azure Functions on Consumption plans and apps on lower tiers where processes are not kept alive. Without Always On, even a Basic tier web app will cold start if idle, leading to initial load latency[46][34].
  • Dependency performance issues: Many app slowdowns are due to external calls taking long. Examples include database queries that are slow (due to missing indexes or high DTU utilization), calls to external APIs that might be rate-limited or under heavy load, or waiting on a file storage or Redis cache. According to Microsoft guidance, common causes of slow app performance are “network requests taking a long time” and “inefficient database queries”[48]. If your app spends a lot of time waiting on dependencies, it will appear slow to end-users. The presence of one or a few dependencies with much higher response times than normal is a red flag.
  • Application-level inefficiencies or leaks: High CPU usage could be caused by a tight loop or expensive algorithm in code. High memory usage might be due to a memory leak (objects not being freed). These issues get exacerbated under load. For instance, a bug causing 100% CPU usage will degrade performance for all users[36]. Such issues might not surface at low traffic but become evident as load increases.
  • Suboptimal configuration or limits: Sometimes the app hits platform limits – e.g., hitting the maximum number of concurrent connections or threads for the given plan. On .NET, for example, if too many synchronous calls block threads, you could exhaust the thread pool, causing queuing. If using 32-bit mode (on Windows apps) you might be limited in memory. Another example is not enabling HTTP/2 or WebSockets if your scenario could benefit, though these are minor compared to the above issues.

Diagnostic Methods:

Start by observing and monitoring the app’s behavior using Azure’s tools: – Azure Portal Metrics: Check the App Service Metrics blade for key counters like CPU Time, Memory Working Set, Requests, Average Response Time, and HTTP Queue Length[38]. A high Avg Response Time combined with a rising Requests metric and high CPU can indicate the app is struggling to keep up with load. Specifically, if CPU is near 100% on all instances and Http Queue Length is greater than 0 (requests waiting), it means the server is overwhelmed and requests are backing up. On the other hand, if CPU is low but response time is high, that suggests waiting on an external resource or lock rather than pure compute issues. – Application Insights – Performance and Dependencies: If Application Insights (AI) is enabled, go to the Performance section to see which operations are slow. AI will list the slowest requests and their durations. You can then drill into Detailed view or Transaction diagnostics for a specific slow request. This end-to-end trace will show each dependency call, database query, etc., that happened during the request and how long each took[35]. If the trace shows, for example, that a particular SQL query or HTTP call is consuming most of the time, you’ve identified a dependency bottleneck. Often, viewing a few samples of the slowest requests makes it “straightforward to understand if the delay was caused by slow dependency call(s) or something else”[39]. If all the slow requests point to a specific dependency (say, an external API), then that external component is likely the root cause of the slowness[39]. Also check the Failures section in AI to see if timeouts or exceptions are being thrown when calling dependencies. – Azure App Service Diagnostics (Diagnose and Solve Problems): In the Azure Portal, the App Service resource has a Diagnose and solve problems blade. Under categories like Availability and Performance, it provides an interactive troubleshooter. For performance issues, it might highlight high memory, high CPU, or thread exhaustion. It often provides graphs and analysis (e.g., it may detect “Your app is using 90% of CPU” or “instances are frequently restarting due to memory pressure”). This can quickly validate whether resource constraints are an issue. The tool might also suggest enabling Always On if it detects many cold starts. – Profiler and Application Logs: If Profiler is enabled via Application Insights, it automatically collects traces of your application periodically. Examine Profiler traces for slow operations – it will show a timeline of a request’s execution down to the code level. For instance, Profiler might reveal that a certain function or line of code is taking 5 seconds because it’s doing a lot of computation or waiting on I/O[49]. Microsoft’s documentation notes that these traces can identify “lines of code that slow down the application” including sequential code that could be parallelized or database lock contentions[41]. In addition, review application logs (if using Serilog, log4net, etc.) for any warnings/errors around the time of performance incidents. Sometimes exceptions (like a SQL timeout or an HTTP call failure) will appear in logs, corresponding to spikes in latency. – Resource-specific insights: Check the database’s metrics if applicable (for Azure SQL DB, look at DTU or vCore utilization, storage I/O, and deadlock counts). If using Azure Application Gateway or Front Door in front of the app, ensure they are not throttling or misconfigured (less common for performance but worth checking if relevant).

By correlating these data points, you can usually identify the primary factor: CPU-bound vs IO-bound vs external dependency-bound.

Proven Solutions:

Depending on the diagnosis, apply one or more of the following solutions that have proven effective:

  • Scale Out or Up: If the app is under-provisioned, scaling is the fastest way to alleviate pressure. Scaling out means increasing the instance count. With more instances (workers), incoming requests are spread out, reducing per-instance load and response times[33]. Configure auto-scale rules so that instance count increases before the existing ones become overwhelmed (e.g., scale out at 70% CPU). Also ensure you haven’t set a low max instance count inadvertently. On the other hand, scaling up means moving to a higher pricing tier (more CPU/RAM per instance). For example, moving from an S1 to an S2 plan gives more memory and CPU per instance, which can handle more throughput per machine. Microsoft’s guidance suggests that truly CPU-bound scenarios may require scaling up to a more powerful SKU[36]. Often a combination is ideal: scale up to a point where each instance has headroom, and scale out to handle concurrent load and provide fault tolerance[33]. After scaling, monitor the improvement: you should see CPU % drop and throughput increase, and the Azure Monitor chart for Response Time should come down as instances are added.
  • Enable Always On & Reduce Cold Starts: If cold starts were identified (for instance, the first request after a few hours was slow), turn on Always On for your Web App (available in Basic and above tiers). This setting keeps the worker process alive by periodically pinging it, preventing it from idling out[50]. As per Microsoft’s recommendation, Always On helps “keep your app warm and avoid cold starts”, significantly reducing delays after idle periods[50]. In the context of Azure Functions, consider moving from Consumption to a Premium plan or using pre-warmed instances if available, since Premium plan functions keep a minimum number of instances always warm, thus eliminating most cold start latency[34]. If using deployment slots with auto-swap, make sure to warm up the slot (via application initialization) to avoid cold start on swap.
  • Optimize Dependencies (Reduce bottlenecks): When Application Insights or other profiling has pointed out specific slow dependencies, address those:
  • Database: If queries are slow, look at query plans and add indexes or rewrite queries as needed. Maybe leverage caching for frequent read queries (Azure SQL’s in-memory caching or Redis cache). If the database is consistently maxing out resources (DTUs or CPU for vCore), scale it up to a higher tier or enable performance features. Also consider enabling connection pooling (most frameworks do this by default) and ensure you aren’t opening new DB connections on every request unnecessarily.
  • External APIs: For third-party or external HTTP calls, consider adding circuit breakers or bulkhead isolation – if that API is slow or down, you might timeout or skip those calls to not hang your entire request. Use asynchronous calls so that threads aren’t blocked (Node.js naturally does this; in .NET use async/await for I/O). If you control the service, try to improve its performance or deploy it closer (same Azure region) to reduce latency. Sometimes the solution might be to simply cache the result of an expensive API call for a short period. For example, if the app fetches exchange rates from an API, caching them for even 1 minute can dramatically cut down outbound calls.
  • Increase parallelism: If your code currently calls dependencies in sequence, and they are independent, call them in parallel to reduce total latency. Be mindful of not overloading the dependencies though. Azure’s guidance on profiler traces gives examples like “sequential code that can be run in parallel” as a target for optimization[41].
  • Timeout and Retry policies: Ensure you have appropriate timeouts for dependency calls. For instance, do not let a single external call hang for 5 minutes – set a reasonable timeout (a bit higher than typical response time) so that hung calls release resources. Implement retries with exponential backoff for transient issues (but be careful not to overwhelm a struggling dependency).
  • Asynchronous processing: If possible, offload long-running operations to background jobs or queues (e.g., Azure Functions or WebJobs). For example, if a user request triggers an image processing task that takes 10 seconds, you could enqueue it and return a response immediately (perhaps using SignalR or polling to update the client when done). This prevents web threads from being tied up and improves responsiveness.
  • Fix Application Code Issues: If profiling pointed to inefficient code in your app (like a method consuming too much CPU or memory), refactor that code. Examples include:
  • Remove unnecessary loops or heavy computations from the request path, or move them to background tasks.
  • Optimize algorithms (use more efficient data structures or caching of results if possible).
  • Fix memory leaks: ensure you are not inadvertently holding references to large objects, and dispose of objects (database connections, file handles) promptly. High memory usage can lead to garbage collection pauses, which degrade performance. If you see memory steadily increasing over time in metrics, that’s a clue[51].
  • Examine whether you’re doing synchronous operations that could be async. For instance, performing synchronous file I/O or network calls will block threads – switch to async versions to free up threads.
  • If thread pool starvation is observed (in .NET, you might see increasing queue length but low CPU – meaning threads are tied up waiting), ensure all I/O is truly asynchronous and consider increasing thread pool via settings if needed.
  • Use Profiler’s code-level insights: the traces might show, for example, a certain function taking 2 seconds of CPU – maybe due to complex LINQ operations or excessive JSON serialization – such issues can sometimes be fixed by more efficient coding or using streaming instead of buffering entire payloads, etc.
  • Platform or Configuration Tweaks: Depending on the scenario, a few configuration changes can help. For example:
  • Turn on HTTP/2 if not enabled (HTTP/2 can improve performance with multiplexing).
  • If you have many small static files, consider enabling Azure Front Door or CDN to offload static content delivery.
  • Ensure Health Check is configured (App Service Health Check pings a specified endpoint and can remove unresponsive instances from rotation)[52]. This doesn’t improve performance per se, but it can eliminate an instance that’s in a bad state.
  • For language-specific optimizations: e.g., in Node.js, ensure you aren’t using a single-threaded approach for CPU-bound tasks; in Python, perhaps use async frameworks or Azure Functions (with multiple workers).
  • If the app is on an older runtime or framework, upgrading (e.g., .NET 6 or .NET 7 typically has performance improvements over older .NET Framework versions) can yield better performance.

After applying these solutions, closely monitor the effect. You should see improvement in the metrics: average and 95th percentile response times should drop. In Application Insights, you can track dependency durations – those should decrease if the bottleneck was addressed[35]. If scaling out, verify that new instances are handling traffic (App Service analytics, or even hitting different instance names in response headers to ensure load is spreading). Users should report that the app feels snappier and timeouts or errors are no longer occurring. Always compare before/after metrics to quantify the gain (for instance, “CPU went from 90% avg to 50%, and 95th percentile response time from 8s to 1.5s after adding two more instances and enabling caching”).

Sources: Azure App Service performance FAQ[46][36], Microsoft troubleshooting guides[53][40], Azure expert blog on performance root cause analysis[39][42], and Azure Monitor/App Insights documentation[35][34].

Join the discussion

Bülleten