Meraki Auto-VPN over MPLS

Here’s a quick review of a recent Meraki MX deployment I wrapped up this week. The customer has 4 sites across Canada; one HQ with an Internet and MPLS connection, and 3 branches with only MPLS connections. We replaced their aging Juniper SSGs with MX65s.

Here’s the basic topology:

Initial State

The branch MXs reach the cloud controller through the HQ internet connection – internet connections are a requirement for running the Meraki gear; but it doesn’t have to be local. The branch MX uplink ports are connected to the MPLS cloud. At HQ, the MPLS cloud connects to a LAN port – keep this in mind.

Of course, site-to-site connectivity was required. Each site has two internal subnets/VLANs: one data, one voice. Simple enough right? Auto-VPN to the rescue!

VPN Setting

Turn on VPN at each site – we want full mesh, so every site is a hub. Then advertise the two local subnets into the VPN cloud at each site:

VPN networks

Done. Simple. Result:

Spoke to Spoke Only

Fancy green lines mean successfully established VPN connectivity! Auto-VPN is smart enough to realize that each branch/spoke appears on the internet at the same IP, but that the locally set uplink addresses are all reachable from spoke to spoke, so those VPNs form without a problem. But this is not what we wanted. Spoke-to-spoke traffic is good for voice calls, but all of our servers (including the call manager) are at HQ… and there are no fancy green lines to HQ.

What is going on? Well I can’t say with 100% certainty what exactly the limitation is, but I know one thing about the Meraki MX and VPNs – they won’t establish VPN SAs over non-uplink ports.

There are also some limitations with hairpinning – in this case, in order to establish an SA with the HQ uplink (Internet) port, the branches would need to exit the HQ uplink port, and do a 180° turn to form an ISAMKP SA. I have some intel that says Meraki has tested this feature, but I can’t say when it will be deployed to customers.

So in the meantime, we need another solution – and Meraki has just the feature.

Remember VPN concentrators?

Add Concentrator

We can put in another MX at the HQ in VPN concentrator mode. It’s a normal MX (in this case an MX64 because we only need the one port). We connect the uplink to another LAN port on the HQ MX, in this case, we gave it a separate subnet.

We can only use the uplink port on the MX when it is in this mode, and then it will serve as a dedicated hairpinning interface, effectively putting that uplink interface inside the HQ LAN.

passthrough setting

When we do this, the VPN settings page changes. On the concentrator, we manually specify the subnets at HQ that should be reachable over the VPN:

concentrator VPN settings

No routes required – the concentrator will send everything to it’s own default gateway, which in this case is the HQ MX, where both these subnets are homed.

We then turn OFF the VPN at the HQ MX, and create static routes on the HQ MX pointing to the concentrator.

The new result:

End Result

Now we have fancy blue lines showing the VPNs between the branches and HQ, resulting in a full mesh.

Rumor has it that we’ll see two improvements in these deployments sometime in the near future – the ability to use the second uplink port without an internet connection to terminate VPNs, as long as the primary uplink can reach the cloud controller; and the ability to hairpin out the uplink as I mentioned earlier. Either of these should have replaced the need for a dedicated concentrator.

In the meantime, here’s the Meraki deployment guide for the VPN concentrator role:

https://documentation.meraki.com/MX-Z/Deployment_Guides/Configuring_VPN_Concentrator_for_the_Data_Center

 

 

Asymmetric routing is not for your LAN

UPDATED: Figured it out! Go to the bottom for the fix.


 

I know I’ll continue to see it everywhere. It just happens by accident. They didn’t realize what they were doing.

But they expect me to clean up the mess. Without downtime. Where’s my hammer?

Yet another unintentional asymmetric routing scenario! This one is bugging me a bit. I am in the process of fixing this client’s LAN topology and will remove the asymmetric routing in the process, however, I need to transition services onto a single firewall with as little downtime as possible, without being on site. Not uncommon so far.

Here’s the gist:

curse-you-asymmetric-routing
Asymmetric routing is ok… on the WAN, on the internet, and in some cases, if you INTEND to do it. Most of the time, it’s just EVIL.

Today, the remote access client tunnel terminates on the branch default gateway (10.1.1.1). I’m reconfiguring the newer firewall (10.1.1.2) to take over that role. The newer firewall was installed to terminate a VPN across the WAN (not internet) circuit to the remote branch. It is now destined to be the sole/main/one ring to rule them all firewall for the single subnet LAN.

Both firewalls are Cisco ASAs. I know that, in order to make this work until I can change the default gateway for the LAN to 10.1.1.2, I will need TCP state bypass on the default gateway (10.1.1.1), right? Right.

Lo’ and behold – TCP state bypass is already configured, for a LOT of traffic – because the remote branch (10.2.2.0/24) is hosting some critical services, and it is also accessible over a different gateway, which is on the same subnet as the default gateway.

Me: “Were you guys aware that you were using TCP state bypass for all traffic to the branch office?”

Client: “Hm, that sounds familiar… where have I seen that before…?”

Right. So the client googled something and figured out how to make it work, but really doesn’t understand what they’re doing (turning off core firewall functionality). But it worked!

I’ve been here before, I’ll be here again.

SO I configure the remote access VPN on the newer firewall (10.1.1.2), without state bypass, and can’t RDP to the DNS server. I expected that. In the diagram, steps 1-3 happen as expected, but step 4 doesn’t happen. Enable the state bypass, problem solved, TCP traffic works. Steps 1-5 work as expected.

Here’s the part that is bugging me though. I’ve made it work with a kludge, knowing full well that I’m going to fix the core problem soon. But I haven’t had time to really figure out what the problem is…

VPN clients couldn’t do name resolution. UDP lookups to port 53 were not getting a response. Since I am working remotely from the site, embedded PCAP is the best tool I have.

PCAP on new ASA (10.1.1.2) inside interface indicates that the DNS query from the VPN client is transmitted on the inside interface to 10.1.1.80. No response is received from the DNS server.

PCAP on default gateway ASA (10.1.1.1) inside interface indicates that the DNS server is, in fact, sending a response to the client at 10.10.10.X, and it arrives INBOUND on the inside interface. An outbound flow is not captured, matching the PCAP on the new ASA.

So looking back at the diagram, steps 1-3 are fine. Step 4 never proceeds.

So… UDP shouldn’t have the same asymmetric routing issue as TCP right (it is TCP state bypass after all)? There is no “state” to check.

Hairpinning is indeed enabled on the default gateway ASA. Informational level logging didn’t indicate anything interesting, but it would still seem that this firewall is dropping the traffic.

Fix? Static routing entry on the DNS server (10.1.1.80) for 10.10.10.0/24 via 10.1.1.2. (FOR SHAME BRENNAN!). 

I know, I would NEVER leave this in production. But I needed to confirm quickly where the issue lies.

At the moment, the reason for the drop is not coming to me. Let me know in the comments, or on twitter (@bmroute) if you have an idea.


 

FIX: Credit to my mentor, CCIE Security @rwatt13. Default DNS inspection treats the UDP query/response similarly to a TCP state in that the ASA doesn’t like to see a DNS response without a query first. Disabling DNS guard on the gateway ASA fixed the problem. See some explanation of DNS guard here.

YES, I REMOVED THE STATIC ROUTE FROM THE WINDOWS SERVER. I’m not an animal.