For When You Can't Have The Real Thing
[ start | index | login ]
start > Dell > Power Connect 6224 > Can't Reliably Ping 6224 From Directly Connected System

Can't Reliably Ping 6224 From Directly Connected System

Created by dave. Last edited by dave, 12 years and 88 days ago. Viewed 4,493 times. #6
[diff] [history] [edit] [rdf]
labels
attachments
IAodV.jpg (15723)
(27 November 2011)

Problem

OK, here's my situation.

IAodV

This is on the internet. The 6224 is the router in this picture and physically resides in Kanata.

Both VLAN 1697 and 3994 are provided by an internet service provider. These VLANs are provided through a single 1Gb ethernet wire.

The Kanata hosts are directly attached to the 6224; the other two sites are remote.

VLAN 3994 is a single IP address space, so theoretically it shouldn't matter physically where the hosts on that subnet are.

Here's the problem.

I have a monitoring system which is connected further into the internet, so probes from the monitor would come in to this diagram on the 1697 VLAN.

When I ping hosts at Albert or Bells Corners from the internet, there is 0 loss. The connection looks perfect.

When I ping hosts at Kanata, I lose anywhere from 10 to 40% of the pings. The loss is not predictable, but: when I do lose them, I always lose at least 3, usually 4, rarely more, pings in a bunch.

I have attached a monitor directly to the 6224 in Kanata on 3994.

When the monitor pings the 6224 routing interface, I see exactly the same loss pattern -- but NOT at the same time as the loss from the remote system. Ping time is around 1ms.

When the monitor pings another system directly attached to the 6224, there is 0 loss. Ping time is about 0.1ms, one-tenth of the time to ping the router.

Anyone know what is going on here?

Update to make things less clear maybe

What seems to be going on is that traffic that comes in and goes out the ISP's connection is fine. Traffic that is going from the router brain to the switching brain (or back, maybe) is what is having the problem.

I can't blame the ISP because internet access to/from the two remote sites is solid. It is only hosts which are directly attached to the 6224 which are having issues.

Update 2

OK, after a lot of time staring at traces, I have a more specific symptom.

I did a tcpdump on vlan 3994 of the ISP uplink looking for my own address on the theory that all I should see is broadcast traffic going to the remote sites. Instead, I saw the packets that I would have expected to see on my system's interface going down the TLS on this VLAN.

So:

For some reason, the 6224 frequently thinks that my system is at the far end of the TLS.

When I inspect the switching-table when things are working, my entry looks like this:

3994     0007.E924.F714        2/g16      Dynamic

…which makes sense since it is plugged into port 16. However, when it is broken, it looks like this:

3994     0007.E924.F714        2/g22      Dynamic

Streams of misdirected packets seem to be led by a broadcast from my system. However, I see one broadcast leave my system, and two on the 3994 VLAN to the TLS. Usually it is a IGMP V2 Membership Report / Join Group 224.0.0.251, but sometimes it is the management chip on my system arping for itself (it does this every 2 seconds or so for reasons which are stupid).

This implies that there is a system in Bells Corners or Albert which is hearing my broadcast, and echoing it back for some reason. So the 6224 goes ah, this mac must really be down the TLS link, and adjusts its switching-table accordingly.

Does this description of the problem ring any bells ?

Solution

OK, I figured this out and I'll write it out here. This particular solution is unlikely to help anyone because it is an edge case.

Back in the ancient history of the link with this provider, we added a second VLAN to the primary one. At the time, the provider then connected this VLAN as both tagged and untagged on their side of the connection. Their switch treats the tagged and untagged as separate connections.

So what happens is my system connected to the Dell emits a arp broadcast (the management interface on this computer emits arp packets every half second for reasons which are stupid), which the switch forwards down the link to the remote site. The switch at the provider hears the broadcast on the untagged interface -- and sends it back to me on the tagged interface. The Dell hears this and then concludes that the mac address originating the broadcast is really reachable via the provider's link. Follow up packets therefore get misdirected.

The solution was to have the provider change their configuration so that it agreed with that on the Dell. All general connection problems have ceased.

no comments | post comment
This is a collection of techical information, much of it learned the hard way. Consider it a lab book or a /info directory. I doubt much of it will be of use to anyone else.

Useful:


snipsnap.org | Copyright 2000-2002 Matthias L. Jugel and Stephan J. Schmidt