AYIAYA tunnel looses connection silently (aiccu 20070115 on OpenWrt)
Shadow Hawkins on Monday, 21 March 2011 10:59:12
I am running an IPv6 network via a tunnel through an OpenWrt (10.03.1-rc4) router running aiccu 20070115 (prepackaged). Every few hours (no regular schedule) the tunnel suddenly stops carrying any IPv6 traffic, with no indication on the router or in the syslog.
The connection is not completely idle, as I am monitoring connectivity with multiple Nagios probes through the tunnel every 5 minutes. Equivalent IPv4 monitoring shows no hickups.
Connection is via another NAT box (a ZyXEL DSL router provided and controlled by the ISP), however neither the real nor the NATed IPv4 address changes.
I can currently get the connection back by simply restarting the aiccu initscript through the OpenWrt web interface and/or the ssh console (either will work). But this is manual and depends on me spotting the alarm and taking action.
Is there any way to make aiccu itself detect the lost connectivity and reinit itself, or is there anything else I can do to keep the connection stable?
AYIAYA tunnel looses connection silently (aiccu 20070115 on OpenWrt)
Jeroen Massar on Monday, 21 March 2011 11:03:13
First off all, provide a LOT more details. Packet dumps would be very useful.
As it is a WRT box, you might also want to check your clock.
Just restarting it is not the correct way to resolve this as that does not change any of the things that AICCU does, except maybe a new UDP connection.
AYIAYA tunnel looses connection silently (aiccu 20070115 on OpenWrt)
Shadow Hawkins on Monday, 21 March 2011 11:30:26
As this is a public forum, I won't state any IP addresses, tunnel numbers or other account details.
The overall setup is like this:
SIXXS-PoP--Internet--ISP-ZyXEL-NAT--OpenWrtBox--InternalNet--NagiosServer
ISP-ZyXEL-NAT has constant real IP rr.rr.rr.rr, it is set up to forward all traffic to "NAT-ed" address 192.168.zz.zz/24, which is the static IPv4 address on the outside of the OpenWrtBox.
OpenWrtBox does a second level of IPv4 NAT to InternalNet 10.ii.ii.1/16, it also has IPv4 and IPv6 firewall rules. OpenWrtBox runs aiccu and radvd, acting as the default gateway (on both IPv4 and IPv6) for InternalNet.
NagiosServer has static IPv4 and IPv6 address on the internal network and runs probes every 5 minutes. Both internal and external servers are probed with both ping and tcp connections.
When the issue occurs, IPv4 pings and TCP probes continue to work as expected. IPv6 pings and TCP probes to servers on InternalNet and to OpenWrtBox itself also continue to work. IPv6 pings and TCP probes to all external IPv6 addresses and servers fail. Manual pings and traceroutes from other internal machines to public IPv6 addresses also fail, although I will have to wait for the next failure to note the exact failures reported.
The OpenWrt box has limited memory/disk space and I am not sure I can fit a packet dumper in there, although I might be able to set up a dedicated packet dumping machine between the OpenWrt machine and the ZyXEL box.
The graphs on sixxs.net indicate 100% packet loss during failures.
From the symptoms I have seen so far, it looks like aiccu or a closely related piece of software is getting into an inconsistent internal state, as simply restarting the aiccu initscript brings the connectivity back.
Similar, but not quite identical issues have been reported by other aiccu users in connection with the daily IPv4 address changes in Germany, but this is a different country and ISP, and the IPv4 addressing is essentially static. Those reports usually referred to fixes that might be in aiccu CVS but not available on the sixxs.net website.
AYIAYA tunnel looses connection silently (aiccu 20070115 on OpenWrt)
Jeroen Massar on Monday, 21 March 2011 12:36:55
(whois provides most of the details you are omitting, just that you know)
Check the connection tracker FAQ, you have at least two of those: the zyxel and the OpenWRT box. The first you can't do much about most likely the latter you can resolve with the info from the FAQ.
The problem with the 'changing addresses' is that the socket is locally bound, but as that IP address changes and that change is not applied by the kernel, that is it does not change the outgoing address things break. Note that Linux does not have a problem with this, FreeBSD does.
AYIAYA tunnel looses connection silently (aiccu 20070115 on OpenWrt)
Shadow Hawkins on Monday, 21 March 2011 13:22:26
Note 1: OpenWrt uses a Linux 2.6.32 long-term-support kernel.
Note 2: The IP addresses do not change.
Note 3: The only FAQ entry I could find is about proto-41 (6to4) and idle connections through a configurable NAT. This is AYIAYA not 6to4, the connection is kept busy by Nagios pinging our outside IPv6 servers, and statistics show the problem getting worse with increased traffic. Also with aiccu running on the Linux (openwrt) router, it does not pass through the NAT layer on that router.
Note 4: The ZyXEL router is set to forward all unidentified/untracked UDP traffic to the OpenWRT box and this works for other incoming UDP traffic. This hopefully minimizes any dependency on the connection tracking in the ZyXEL router.
So far it still looks like aiccu is getting internally confused about the up/down state of the tunnel, thus failing to bring up the tunnel when it goes down.
I am still waiting for the next failure so I can gather additional logging.
AYIAYA tunnel looses connection silently (aiccu 20070115 on OpenWrt)
Jeroen Massar on Monday, 21 March 2011 14:49:45 Note 1: OpenWrt uses a Linux 2.6.32 long-term-support kernel.
What does the kernel version have to do with anything? Everything has bugs and long-term-support just means that bug fixes might be backported from new kernels, not that it is any better than anything. And you still need to upgrade to have all those fixes.
Still I don't know why you make that note.
Note 2....
Which is a good thing in this setup as it takes that uncertainty away.
Note 3: The only FAQ entry I could find is about proto-41 (6to4) and idle connections through a configurable NAT.
This mechanism could affect every other protocol being sent through the stack.
This is AYIAYA not 6to4,
Please read up, it is written AYIYA. Also although 6to4 is using proto-41, proto-41 is not just 6to4.
The connection is kept busy by Nagios pinging our outside IPv6 servers,
Every 5 minutes with an expiration timer which is much less does not help.
Also with aiccu running on the Linux (openwrt) router, it does not pass through the NAT layer on that router.
But it does pass through the connection tracking layer. Please actually see the FAQ item which explains that.
This hopefully minimizes any dependency on the connection tracking in the ZyXEL router.
But are you sure it works that way?
So far it still looks like aiccu is getting internally confused about the up/down state of the tunnel, thus failing to bring up the tunnel when it goes down.
As there is no state inside AICCU, except for "socket is connected", which it is at the start, and then forwarding packets, your statement does not make any sense. AICCU does not know if a tunnel is up or down, it just sends and receives packets, that is it.
Your friend of the day will be tcpdump or wireshark.
AYIAYA tunnel looses connection silently (aiccu 20070115 on OpenWrt)
Shadow Hawkins on Monday, 21 March 2011 18:51:31
Ok, it happened again and this time I had dedicated sniffer machines running Wireshark on both sides of the OpenWRT box, and I could see the failure mode. It was the ZyXEL NAT that messed up, but in a very unusual way:
The OpenWRT box sent out AYIYA from one UDP port number (the same one it was bound to on the OpenWRT box), but packets from the PoP to the OpenWRT box were suddenly addressed to a different UDP port number. Unfortunately I have no way of testing which of the two port numbers was/is the one seen on the Internet and at the PoP.
During the test, the connection to the ZyXEL box unexpectedly failed completely (either the loose ethernet cable to the sniffer was accidentally pulled for several minutes or something happened closer to the ZyXEL box) and everything snapped back. This failure mode of the ZyXEL box had previously been seen for SIP traffic, but I thought that was due to the notoriously broken SIP ALG, not a general NAT failure.
Thus it seems that the better workaround may be to restart the ZyXEL box and keep aiccu running.
For those who might want to know the ZyXEL box is a "ZyXEL 2602R-61" with ZyNOS firmware version "V3.40(AJX.4) | 05/26/2009" .
P.S.
Note 2 was because you wrote something about Linux versus OpenBSD, so I thought you did not know if the OpenWRT box ran Linux or OpenBSD. The kernel version was in case that data point could be relevant, "Long Term Support" was just part of the name of that kernel version, nothing special.
Posting is only allowed when you are logged in. |