In an Azure Virtual Network, you may encounter scenarios where virtual machines or services cannot communicate with each other or reach the internet. Common symptoms include VMs being unable to ping one another, failure to establish RDP/SSH connections, or applications timing out when accessing services over the VNet. For instance, a web app might not reach an Azure Storage account via a service endpoint, or a VM can’t reach an on-premises network through a VPN. These symptoms indicate an underlying networking issue such as filtering, routing, or name resolution failures.
Possible Root Causes:
- Network Security Group (NSG) rules blocking traffic: NSGs control inbound/outbound traffic. If a crucial port or IP is blocked by an NSG rule, connectivity will fail. For example, if the default “AllowVNetInBound” rule is overridden or a higher priority rule denies required traffic, VMs in the same VNet won’t communicate[1]. NSG rules are evaluated by priority, so a deny rule with a low number can override later allow rules[19]. Misconfigured NSGs are a very common cause of connectivity issues.
- User-Defined Route (UDR) misconfiguration: A UDR on a subnet can direct traffic to incorrect next hops. For instance, a route forcing all traffic through an NVA or on-premises VPN might unintentionally black-hole traffic destined for Azure services or the internet. If a UDR’s next hop is unreachable or not forwarding traffic properly, VNet resources won’t reach their targets[14].
- DNS resolution issues: If VMs cannot resolve hostnames, they may fail to connect even if network paths are open. Using a custom DNS server that is unreachable from the VNet or having incorrect DNS server IPs will cause name resolution failures[3]. This often surfaces as an inability to reach Azure services by name (for example, a web app’s outbound call to an API fails because the domain name can’t be resolved).
- Service Endpoint configuration problems: Azure VNet Service Endpoints, when enabled, change the route of service traffic to use the Azure backbone directly. If not fully configured, this can break connectivity. For example, simply enabling a service endpoint on a subnet without updating the service’s firewall (to allow the VNet) will result in the service returning an HTTP 403/404 for that subnet[20][21]. Additionally, enabling a service endpoint changes the source IP of traffic from the VM’s public IP to an Azure private address, which can lead to connectivity drops if the service was previously allowing only the public IP[4]. This is a frequent cause of sudden connectivity loss after turning on service endpoints.
Diagnostic Methods:
Start by leveraging Azure Network Watcher tools to pinpoint the issue. Use IP Flow Verify to test if traffic from a source VM to a destination (IP/port) is allowed or denied by NSGs. This tool immediately tells you whether the communication is permitted and, if denied, which NSG rule (name and priority) is responsible[2]. It’s an effective way to confirm if an NSG is at fault and identify the exact rule blocking traffic. Next, run a Connection Troubleshoot (Connection monitor) between the two endpoints (e.g., from a VM to another VM or to a service’s IP). This performs an end-to-end path probe and returns the network hops. In the output, look for any Issues reported at a hop, such as NSG or UDR blocks. For example, an output may show an issue type “NetworkSecurityRule” or “UserDefinedRoute” at a certain hop, indicating an NSG or route is dropping the packets[22]. The presence of “ProbesFailed” and “ConnectionStatus : Unreachable” in the result confirm the traffic is not getting through[23][24].
Examine NSG flow logs if enabled. Flow logs (accessible via Network Watcher) will show entries for allowed or denied traffic. If you see your source/destination in the logs with an “Deny” action when it should be allowed, that identifies the culprit rule (the logs include the NSG rule that matched)[25]. Additionally, use the Effective Security Rules feature in Azure Portal (on the VM’s NIC blade) to see the combined effect of subnet and NIC NSGs. This helps discover if a lower-priority deny on either level is overriding an allow[26].
For routing issues, the Network Watcher Next Hop tool is invaluable. It tells you the next hop for traffic from a VM to a given destination. If the next hop is not as expected (for example, it shows a UDR to an offline appliance or points to “None” when it should be “Internet”), you’ve likely found a routing problem. Similarly, check effective route tables for the subnet (via Azure Portal or Get-AzEffectiveRouteTable PowerShell) to see if any UDR is taking precedence over system routes.
To diagnose DNS, perform name resolution tests from within the VM (nslookup or Test-NetConnection -Port 53 to the DNS server). Verify the VNet’s DNS settings. If a custom DNS server is configured in the VNet, ensure the VM can reach it (for example, if it’s an on-prem DNS, the VPN must be up). Microsoft’s guidance suggests verifying the DNS server IP is correct and reachable[9]. In issues where Azure services aren’t reachable, Microsoft support often finds an incorrect DNS server address as a root cause[3].
For service endpoints, use Connection Troubleshoot to verify that traffic is indeed going over the service endpoint. If the service endpoint is working, the connection should succeed and the next hop for that service’s traffic should be the Azure backbone (and NSG flow logs would show traffic with the service tag as destination). If you still see traffic egressing to the internet or being dropped, double-check that: 1. The service endpoint is enabled on the correct subnet. 2. The target service (e.g., Azure SQL, Storage) has the VNet/subnet listed in its Firewall and Virtual Networks settings as allowed[5]. 3. No UDR is overriding the system route for the service’s range (service endpoints insert a /32 route for the service). Service endpoint routes take precedence over BGP routes[27]. If a UDR is broad (like 0.0.0.0/0 to a firewall) and doesn’t exclude the service traffic, that can prevent the endpoint from working.
Proven Solutions:
Apply the fix corresponding to the diagnosed root cause:
- NSG Fix: Modify or remove the offending NSG rule. If IP Flow Verify indicated a specific NSG rule blocking traffic, navigate to that NSG and either delete it or adjust its priorities/addresses to allow the needed traffic[7]. Ensure default rules like “AllowVNetInBound/OutBound” are in place on all NSGs for intra-VNet traffic[28]. If you have NSGs at both subnet and NIC levels, remember that traffic must be allowed by both. A best practice is to not duplicate NSGs at multiple levels to avoid confusion[12]. After changes, use IP Flow Verify again – it should report traffic as Allowed if the NSG issue is resolved.
- UDR Fix: If a user-defined route was sending traffic into a dead end, correct or remove that route. For example, if internet traffic was wrongly routed to an on-premises gateway that doesn’t handle it, consider removing that 0.0.0.0/0 UDR or changing its next hop to Internet[13]. If the UDR is needed for other reasons, refine the route (e.g., use more specific prefixes) so it doesn’t catch unintended traffic. In cases where a UDR intentionally routes to an NVA or firewall, ensure that NVA is configured to forward or has connectivity to the destination; otherwise, the UDR should be adjusted or the NVA fixed. Once updated, the Next Hop tool should show the correct route (e.g., “Internet” for public addresses or the correct IP for a NVA) and connectivity should restore.
- DNS Fix: Correct the DNS configuration. If the issue was an unreachable DNS server, one solution is to switch back to Azure-provided DNS (which requires no configuration and automatically works within the VNet). Alternatively, fix the connectivity to the custom DNS: for instance, if the DNS is on-premises, make sure the VPN is stable and the NSGs/Firewall allow UDP/TCP 53 to that server. Microsoft documentation emphasizes verifying that the DNS server’s IP is correct and that the VNet can connect to it[9]. If certain Azure services (like Azure SQL) were failing name resolution, consider using Azure Private DNS Zones linked to your VNet to resolve those service names to private endpoints.
- Service Endpoint / VNet Service Integration Fix: If the service endpoint was not fully configured, complete the setup by whitelisting the VNet on the service side. Go to the Azure service (e.g., Azure Storage account’s networking settings) and add the VNet/subnet under the Virtual network access section. Microsoft’s best practice is to enable the service endpoint on the subnet first, then configure the service’s VNet ACL[5][15]. Doing so in the recommended order prevents a momentary drop caused by the source IP address change[4]. If enabling the endpoint caused the service to reject traffic (due to the source IP now being private), update the service’s firewall to allow the subnet’s address range or use the IgnoreMissingVnetServiceEndpoint flag if applicable (some services allow adding the VNet to ACL before the endpoint is enabled)[29]. In scenarios where a UDR was interfering with service endpoint traffic, modify the route. You might create exceptions for the service’s IP ranges or service tag. Alternatively, consider moving to Azure Private Link which provides a private IP for the service and doesn’t rely on service endpoints (and is not affected by UDRs in the same way). Keep in mind, with service endpoints, ICMP (ping) is not supported to test connectivity; only TCP traffic will utilize the endpoint[30]. So use appropriate tools (e.g., tcping or application-level connections) to test.
After applying the changes, rerun connectivity tests. Use Network Watcher’s Connection Troubleshoot to confirm that packets now reach their destination (it should show ConnectionStatus: Reachable and no blocking issues). Also, test from the application perspective: for example, RDP into the VM, load the web app, or connect to the database as originally attempted. If the root cause was correctly addressed, the connectivity issue should be resolved, evidenced by successful communication between the components.
Sources: Troubleshooting Azure VM connectivity[6][2], Azure DNS and connectivity Q&A[3][31], Azure service endpoint FAQ[32][16], NSG diagnostics[25], Microsoft documentation and expert advice.
