All posts

How to Debug DNS Resolution Failures

Step-by-step guide to diagnosing DNS resolution failures including dig, nslookup, and tcpdump techniques for developers and DevOps.

How to Debug DNS Resolution Failures

When DNS fails, everything fails. Here's a systematic approach to diagnosing resolution issues.

Step 1: Verify the Failure

# Quick check
nslookup api.example.com

# More detailed
dig api.example.com A

# Check from your application's perspective
getent hosts api.example.com

If dig works but your app fails, the problem is in your application's DNS configuration, not the DNS infrastructure.

Step 2: Check DNS Configuration

# View resolver configuration
cat /etc/resolv.conf

# In containers, check if DNS is inherited from host
dig @8.8.8.8 api.example.com   # Test with Google DNS
dig @1.1.1.1 api.example.com   # Test with Cloudflare DNS

Kubernetes pods use cluster DNS (CoreDNS). If external DNS works but internal names fail, the issue is with your cluster DNS.

Step 3: Trace the Resolution Path

# Full trace shows each DNS server consulted
dig +trace api.example.com

# Check specific record types
dig api.example.com CNAME
dig api.example.com AAAA

Common Causes

  • Stale DNS cache — flush with systemd-resolve --flush-caches or restart CoreDNS
  • Missing search domainresolv.conf missing the right search entries
  • ndots configuration — Kubernetes default ndots:5 causes unnecessary lookups; set to 2 for external domains
  • TTL expired during outage — if the authoritative server was down during TTL expiry
  • Firewall blocking UDP 53 — DNS uses UDP by default

Step 4: Check from Inside Containers

# Run a DNS debug container
kubectl run dns-debug --image=busybox --rm -it -- nslookup api.example.com

# Check CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns

Monitoring DNS Health

DNS failures often cascade into application errors that look like connection timeouts. If you see a spike of timeout errors in Bugsly, check DNS resolution first — it's a common root cause that manifests as seemingly unrelated failures across your entire service.

Prevention

  • Monitor DNS resolution time as a metric
  • Set reasonable DNS TTLs (300s is a good default)
  • Use DNS caching at the application level for high-throughput services

Try Bugsly Free

AI-powered error tracking that explains your bugs. Set up in 2 minutes, free forever for small projects.

Get Started Free