Saturday, 4 February 2017

RPi: bonding network interfaces

One of the big advantages of the Raspberry Pi is that it's form factor makes it so portable; I can have a RPi moved around the home depending on what I need at the time but it does mean I need to worry about which network interface is available to the RPi without having to log into the RPi, reconfigure to/from wifi to/from ethernet etc..



In the ideal world, perhaps you'd just use the wifi interface and be done. However this isn't always possible - poor wifi signal/interferance etc - and sometimes you may want to be wired to avoid the possibility of someone maliciously kicking you off your wifi or because it's an extra draw on your power supply.

But ultimately, your requirement is for a your headless RPi to be available on the network at a known single static address irrespective of which network interface is available.

Primative Option

One primitive option is to configure both your ethernet and wifi to have the same interface.
auto eth0
allow-hotplug eth0
iface eth0 inet static
    address               192.168.0.156
    netmask               255.255.255.0
    gateway               192.168.0.1
    dns-nameservers       8.8.8.8

auto wlan0
allow-hotplug wlan0
iface wlan0 inet static
    wireless-power        off  # externel wifi dongle
    wpa-conf              /etc/wpa_supplicant/wpa_supplicant.conf
    address               192.168.0.156
    netmask               255.255.255.0
    gateway               192.168.0.1
    dns-nameservers       8.8.8.8
This will work in that eth0 and wlan0 are created with the same IP address and if you remove the ethernet cable the network will still be available. However this works because the system creates multiple routes where the eth0 device has a higher priority (check route -n and the Metric column - lower means higher priority).

So this kinda works and meets our requirement but it's not the elegant or correct way of doing it.

To avoid the multiple routes option, you may use the ifplugd and custome scripts to bring down eth0 (removing the route) and up the wifi interface when it detects the pulling of the ethernet cable.

However this problem has already been solved in the world of networking under the guise of IP multi-path (Sun Microsystems terminology?) or interface bonding/agregation.

Interface Bonding

With input from the Raspberry Pi forums the oddities/race conditions/setup of Raspbian's network setup were resolved to provide the answer that follows.

The kernel can be configured to aggregate multiple physical interfaces into a single virtual interface: all applications are agnostic to the underlying interface. The kernel can be configured to use different policies when using this network interface, such as load-balance or fault tolerance. To our requirement we need the kernel to treat the two interfaces in fault tolerant mode, that is to prefer one primary (physical) interface and only start using the secondary backup (physical) interface when the primary is not available.

The kernel module required is bonding. For the Raspbian setup, the only additional requirement is "ifenslave" and then to configure the kernel module (so the effects/options are enabled at boot) and then the interfaces configuration.

$ apt-get install ifenslave
## /etc/modules
...
# add this near the top to force load module at boot - above any force load of wifi modules
bonding

Whilst these module options can be configured at runtime, I found this to misbehave, resulting in the inability for wifi to associate with the AP.
## /etc/modprobe.d/bonding.conf
options bonding fail_over_mac=active mode=active-backup primary=eth0 primary_reselect=always

Reconfigure the interfaces so that we create the 'bond' and all network traffic will be routed through this. The rest of the system is largely agnostic to the eth0 and wlan0 devices.
# /etc/network/interfaces
auto lo
iface lo inet loopback

# confirm status at
# $ cat /proc/net/bonding/bond0

auto eth0
allow-hotplug eth0
iface eth0 inet manual
    bond-master           bond0
    bond-mode             active-backup

auto wlan0
allow-hotplug wlan0
iface wlan0 inet manual
    bond-master           bond0
    bond-mode             active-backup
    wireless-power        off
    wpa-conf              /etc/wpa_supplicant/wpa_supplicant.conf

auto bond0
iface bond0 inet static
    bond-slaves           none
    bond-primary          eth0
    bond-mode             active-backup
    bond-miimon           200
    bond-fail_over_mac    active
    bond-primary_reselect always
    bond-updelay          200
    bond-downdelay        0
    address               192.168.0.156
    netmask               255.255.255.0
    gateway               192.168.0.1
    dns-nameservers       8.8.8.8
A few things to note above: the wlan0 and eth0 interfaces are defined as manual which means they will be created and brought up but without any IPs attached and defined which aggregated bond (via bond-master) interface they belong. The bond0 is defined to have the static IP.

Finally, the Raspbian setup always seems to want to involve DHCP even when interfaces are defined as static (they someone get dchp'd anyway). To prevent the DHCP junk, you can add following line:
# /etc/dhcpcd.conf
denyinterfaces eth0 wlan0 bond0
or disable dhcpcd entirely. For me either works.

With all of these items in place rebooting the system should give your RPi a single IP, aggregated over the ethernet and wifi devices, with the ethernet being preferred whenever it is available.


Verifying it works

There are few ways to verify the setup is successful.
  • Kernel Routing Table - only the bond0 interface has a route.
    $ route -n
    Kernel IP routing table
    Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
    0.0.0.0         192.168.0.1     0.0.0.0         UG    0      0        0 bond0
    192.168.0.0     0.0.0.0         255.255.255.0   U     0      0        0 bond0
    
  • all physical interfaces have no IP address, with only bond0 having an IP
  • the wifi device is successfully assocaited with the AP even though it does not have an IP:
    $ iwconfig wlan0
    wlan0     IEEE 802.11bgn  ESSID:"go_away"  
              Mode:Managed  Frequency:2.437 GHz  Access Point: 00:62:2C:99:E9:1C   
              Bit Rate=1 Mb/s   Tx-Power=31 dBm   
              Retry short limit:7   RTS thr:off   Fragment thr:off
              Power Management:on
              Link Quality=57/70  Signal level=-53 dBm  
              Rx invalid nwid:0  Rx invalid crypt:0  Rx invalid frag:0
              Tx excessive retries:0  Invalid misc:0   Missed beacon:0
    
  • proc interface gives realtime information of aggregated interfaces
    $ cat /proc/net/bonding/bond0  
    Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
    
    Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
    Primary Slave: eth0 (primary_reselect always)
    Currently Active Slave: eth0
    MII Status: up
    MII Polling Interval (ms): 200
    Up Delay (ms): 200
    Down Delay (ms): 0
    
    Slave Interface: eth0
    MII Status: up
    Speed: 100 Mbps
    Duplex: full
    Link Failure Count: 1
    Permanent HW addr: 00:11:22:33:44:55
    Slave queue ID: 0
    
    Slave Interface: wlan0
    MII Status: up
    Speed: Unknown
    Duplex: Unknown
    Link Failure Count: 1
    Permanent HW addr: aa:bb:cc:dd:ee:ff
    Slave queue ID: 0
    

With all of these items above, we can finally check network flow and fail over by ping'ing a known address. In a seperate terminal, you can cat /proc/net/bonding/bond0 looking at the Currrently Active Slave and MII Status of the physical interfaces.

As you unplug the ethernet cable, you will observe that the ping continues uniterrupted and in the /proc the active slave changes to wlan0 and the slave interface entry for eth0 gets marked down and Link Failure Count incremented. When the ethernet cable is reinserted, ping continues and /proc reports that the active slave is reverted to eth0.

2 comments:

PeterKay said...

This is excellent, however how is link failure defined? If the two connections were both different WAN's but one whilst connected failed to deliver any responses such as when a 4G signal goes light on data but still connected?

Ray said...

The kernel would determine 'failure' on the 'link' (ethernet cable or 4G etc). I haven't used any 3/4G cards in my laptops for a long time but I'd suspect there are packets between the card and the cell tower to indicate whether the link is 'good'. On wifi, we've all seen that the wifi is connected but for whatever reason the wifi router doesn't respond to traffic etc - in this case, the failed transmission of data to the router would be noticed deep in the various network layers and enough for the link to be failed and swung over to any backup. I'd guess it'd be same for the 4G card.

Remember that the point of the bond in this setup (active-backup) is to ensure network uptime (from the host and client view) tied this to one IP so only one physical interface is active at any one time.

Post a Comment