PING! Not always what you think! – Meraki Wireless troubleshooting

I’m quite fond of the Meraki dashboard. I’ve seen firsthand how it can enable lean and low-skilled IT departments to manage more of their own networks themselves. The dashboard GUI makes it easy to find status and troubleshoot at a basic level, but it’s still important to actually understand what is going on under the hood.

Here’s an example. If you’ve ever seen the Meraki dashboard, you’ve probably seen the Ping tool on every client status page. Here, a Meraki AP is successfully Ping-ing my MacBook Pro:

pign-success-invalid-ip

Pretty straightforward. Ping the client, client responds, client is online and working, right?

If you have a Meraki Security Appliance, you may stumble across this little note on the Addressing & VLANs page:

ping-is-arp

Wait… Ping is based on ARP? What happened to ICMP?

We may have jumped to conclusions here. As it turns out, Meraki is not using ICMP like most of us would assume. Here’s an example of a few PCAP frames of that same Meraki AP ARP-Ping-ing my MacBook Pro:

directed-arp

Notice this is a directed-ARP; the Meraki AP (MAC 13:da:90) is sending an ARP request to the MBP (MAC 91:75:d8) rather than sending a broadcast. That is, the Meraki AP already knows the MBPs MAC address. But the ARP response tells the Meraki AP that the MBP is alive, and online – just like an ICMP Ping.

This brings about an interesting question. We network engineers often use Ping as a way to confirm that the network is working. A successful Ping means that routing, IP addressing and the physical path are all functioning correctly at layer 3. But if we’re doing a Ping at layer 2 with ARP – would we be wrongly assuming all is well when we get a response, just like with ICMP?

There is definitely some potential to make incorrect assumptions here. In fact, even though that screenshot above of the Meraki AP Ping-ing my MBP has a loss of 0%, at the time, my MBP had an incorrect IP address and was not Ping-able by other devices in the network (well, via ICMP at least). Here’s the PCAP’d ARP frames from the same time as that Ping output:

directed-arp-wrong-ip

Almost identical, except that both the ARP request and response are from 10.11.3.1, when the subnet is actually 10.11.30.0/24. However, the client is still responding, albeit at layer 2, and that’s good enough for the Meraki AP.

Now, I do think this is one of those things where the vendor has made an odd decision to label this as a Ping without being clear about what is actually being done, but it is after all our responsibility as the network engineer to know what we’re looking at. There are similar examples where traceroute can use UDP or ICMP, depending on the OS, and now you know that sometimes Ping is ARP instead of ICMP.

Here’s the relevant documentation:

Meraki Ping Tool

ARGGHH ARRRRRRRP!!!

I run into this issue several times a year with a certain local DSL/Fiber service provider and ASA firewalls. Sometimes it consistently occurs after an outage, others it is seemingly random. This is the relevant debug ARP input (modified for confidentiality):

?arp-req: generating request for 16.197.160.254 at interface Provider1
arp-req: request for 16.197.160.254 still pending
?arp-req: generating request for 16.197.160.254 at interface Provider1
arp-req: request for 16.197.160.254 still pending
?arp-req: generating request for 216.197.160.254 at interface Provider1
arp-req: request for 16.197.160.254 still pending

What makes this a REALLY bad thing is that 16.197.160.254 is the ASA’s default gateway. No internet access for you.

Unfortunately, I have little (no) visibility into the provider settings, but I can hazard a guess that there is some sort of spoofing protection in place. Sometimes a reboot of the ISP modem will fix the problem, but often times we have to call the ISP and, while trying to explain ARP to tier 1 support should NOT be too difficult, it is typically an exercise in frustration. Not surprising, often we are asked to plug a laptop directly into the modem with the ASA’s static IP programmed on its NIC, which works, and causes the ISP to cheer “NOT OUR PROBLEM!” Of course, if I plug a laptop directly into the Provider1 interface on the ASA, it gets an ARP reply right away and communication to the “gateway” also works just fine. Ultimately, I have not been able to find an easy, repeatable fix from the side I have control over (the ASA), but sometimes the ISP clears an ARP table to solve the problem.

Something similar¬†occurs from time to time with this ISP with NAT/PAT IP addresses that are NOT the ASA’s interface address. In this case, we can PCAP traffic leaving the ASA with the appropriate IP and MAC towards the ISP, but the ISP will never forward the return traffic to the ASA – the gateway doesn’t create an ARP entry for that IP.

In this case, an easy fix is to temporarily change the ASA interface IP to match the NAT IP. This causes the ASA to generate a gratuitous ARP and suddenly the return traffic gets delivered.

This is certainly different from the first case, where no amount of restarting or GARP seems to convince the ISP gateway to reply to the ASA ARP request for the gateway’s MAC, but I suspect the cause may be related.

I’m currently waiting for the ISP to determine if an engineer who actually knows how to log into the gateway router and look at an ARP table does indeed exist, or whether I’m more likely to watch a unicorn run a red light on my commute home. In the ¬†meantime, if anyone can explain what this ISP is doing to cause this behaviour, I’d love to hear it.