Windows VMs have issues resolving DNS names, run into network timeouts or packet loss

Problem

Windows VMs (VMware vSphere) have issues when trying to resolve DNS names and run into network timeouts or packet loss on other protocols, too.

For example running a simple PowerShell script shows the issue (Change *YourFQDN* to your FQDN and '*DNS-Server-IP*' to your DNS server ip-address) :
 

1..1000 | Foreach-Object -Process {
    [pscustomobject]@{
        Try         = $_
        ElapsedTime = (Measure-Command -Expression {
                Resolve-DnsName -DnsOnly -QuickTimeout -NoHostsFile -Name '*YourFQDN*' -Server '*DNS-Server-IP*'
            }).TotalMilliseconds -as [int]
    }
} |
    Group-Object -Property 'ElapsedTime' |
    Sort-Object -Property ‚Count'

PowerShell DNS query test script

From 1000 DNS-queries 541x were answered within 2ms
From 1000 DNS-queries 243x were answered within 1ms
From 1000 DNS-queries 57x were answered within 3ms
From 153 DNS-queries were not answered, timeout >1000ms

Debug-Logs of vnetWFP show the event „DEBUG: ALEInspectInjectComplete : Packet injection status is : c000021b”.

Solution

Update your VMware Tools 11.x with Guest Introspection Driver to version 11.2.6 and reboot your VM or uninstall the Guest Introspection Driver. We first suspected it is VMware NSX-T or VMware Carbon Black EDR, but it was not. It was the NSX Guest Introspection Driver.

Root Cause: Packet drop is seen due to intermittent failure reported by the Microsoft WFP packet injection API.

https://kb.vmware.com/s/article/79185

After the update or removal of the driver the issues were gone:

PowerShell DNS query test script after vmware tools update

From 1000 DNS-queries 985x were answered within 1ms
From 1000 DNS-queries 10x were answered within 2ms
From 1000 DNS-queries 3x were answered within 3ms
From 1000 DNS-queries 1x was answered within 4ms
From 1000 DNS-queries 1x was answered within 35ms
From 1000 DNS-queries 0x timed out.

No comments:

Post a Comment

Cribl - Change values to lowerCase

Some logs (e.g. Microsoft Azure) sometimes are not fully normalized to all lowercase characters. You can use Cribl to adjust those values by...