Code

Speeding up DHCP on bad WiFi links

Posted on

On bad WiFi links, initial DHCP negotiations may take a long time due to lost DHCP responses.

In my current project I noticed that on bad WiFi links initial DHCP negotiations may take a really long time.

The reason is simple: In the initial steps, DHCP uses UDP broadcast packages that may get lost.

To find out more and learn how to speed things up, read on …

A short 802.11 primer

In embedded devices that use WiFi modules that only have egyptian PCB traces as antennas, the packet loss between the access point (AP) and the WiFi client (STA(tion) in 802.11 terminology) can be high. When receiving, I could provoke >50% lost packets using modules from different vendors with APs that would receive much better with a proper antenna. Also, the embedded device may operate in a noisy (RF wise) environment.

Two facts about 802.11 WLANs which I didn’t know before:

  • Packets are normally sent through the AP, event if two STAs in the same network (BSS) communicate with each other. There is (T)DLS “Direct Link Setup” in 802.11e/z, but it’s not relevant in this case.
  • MAC unicast packets (i.e. AP->STA or STA->AP) are ACK’d and are retransmitted a couple of times as needed.

By “MAC unicast” I mean that they’re unicast on the data link layer (2), i.e. the packets contain a destination MAC address of a specific device.

There are also “MAC broadcast” packets, which are only sent by the AP. These packets are not ACK’d. It wouldn’t make much sense to wait for all STAs to ACK a broadcast packet.

How does this play together with IP?

  IP unicast IP broadcast
STA->AP MAC unicast MAC unicast
AP->STA(s) MAC unicast MAC broadcast

As you can see, IP broadcast packets from AP->STAtions are sent as MAC broadcast, which are not ACK’d and thus are more likely to get lost.

And this is exactly the problem.

(Side note: To measure packet loss, send UDP broadcast packets from Ethernet->AP->STA and count how many are lost.)

A short DHCP primer

These are the initial steps of DHCP:

  • Client -> Broadcast: DHCPDISCOVER (“I need an IP”)
  • Server -> Broadcast: DHCPOFFER (“Here’s one”)
  • Client -> Broadcast: DHCPREQUEST (“I’ll take it”)
  • Server -> Broadcast: DHCPACK (“OK”)

These are all single UDP packets, targeted to the broadcast addresses of both link (e.g. MAC FF:FF:FF:FF:FF:FF) and network layer (e.g. IPv4 255.255.255.255). Except packets sent by the STA, they have the AP’s MAC address as the destination address at the link layer.

With the explanations before, we now know that OFFER and ACK are the packets which are likely to get lost, because they’re not ACK’d.

To make matters worse, RFC 2131 dictates exponential backoff for client-retransmits (e.g. 4s/8s/16s/…) if no reponse is received for DISCOVER or REQUEST.

This should explain why initial (“cold boot”) DHCP negotiations may take a really long time.

How to speed things up

Use proper antennas

There are certified module/external antenna combinations. If you can, use them. Not only DHCP will work better, but the overall performance will be much better.

Use a proper DHCP client implementation

… or use your DHCP client properly.

What do I mean by that? What I’ve explained about DHCP is only the “cold boot” behaviour as you may know. If you look at Figure 5 of RFC 2131, there are many more states in a DHCP client’s life. So if the device would persist its DHCP information and timer “T1” wouldn’t have expired, on reboot the DHCP client could simply go through INIT_REBOOT->REBOOTING->BOUND with unicast packets.

Use my BROADCAST bit trick

In every packet sent by the DHCP client, there’s a “BROADCAST bit”.

This is what RFC 2131 (4.1) says about the BROADCAST bit:

A client that cannot receive unicast IP datagrams until its protocol software has been configured with an IP address SHOULD set the BROADCAST bit in the 'flags' field to 1 in any DHCPDISCOVER or DHCPREQUEST messages that client sends. The BROADCAST bit will provide a hint to the DHCP server and BOOTP relay agent to broadcast any messages to the client on the client's subnet. A client that can receive unicast IP datagrams before its protocol software has been configured SHOULD clear the BROADCAST bit to 0. The BOOTP clarifications document discusses the ramifications of the use of the BROADCAST bit [21].

So if the BROADCAST bit is set, the DHCP server will answer with a brodcast, if it’s cleared, it will answer unicast. When sending OFFER, it’ll send to the IP address it offers.

(Side note: The mentioned “BOOTP clarifications document” states the same.)

If the DHCP client can be modified to send DISCOVER/REQUEST with a cleared BROADCAST bit and if the rest of the TCP/IP stack can be made to accept unicast DHCP packets even if it hasn’t configured its IP yet, things will improve.

In my case, I’ve changed the DHCP client (it’s was only a tiny change) so that the first packet of DISCOVER/REQUEST has the BROADCAST bit cleared. Retransmits have it set to give bad DHCP server implementations a chance to answer.

Thanks for reading!

Update Feb 2021: There used to be an example here of how to have Segger’s embOS/IP behave as described. I just went through the Changelog and noticed that in V3.30 this behaviour has been implemented. See IP_DHCPC_ConfigUniBcStartMode().