Wednesday, October 28, 2015

DHCP Failover on RHEL 7

enter image description here

As always, i am not the authority on this subject; however, I have successfully added “failover” to our existing DHCP server in which the OS had been replaced several times while simply copying the dhcpd.conf over each time.

Configuring a failover DHCP is essentially not difficult. However, if you are in an “Enterprise” or “Corporate” environment (i.e. multiple subnets), then your router will require an additional “ip-helper” for each subnet. You or your network engineer will need to perform this task for the following system to work. In our case we simply added a secondary ip helper-address <IP> to each subnet (VLAN) in our hardware router.

Prerequisites: 0) properly configured router. 1) dhcpd running and configured properly. 2) EPEL repo installed. 3) ssh-key passwordless logins configured between the two DHCP servers. 4) Time is synchronized on servers (via ntpd or vm-tools’ options)

I reviewed the following sources for this process:
http://blog.whatgeek.com.pt/2012/03/dhcp-failover-load-balancing-and-synchronization-centos-6/
https://kb.isc.org/article/AA-00502/0/A-Basic-Guide-to-Configuring-DHCP-Failover.html
https://www.howtoforge.com/how-to-set-up-dhcp-failover-on-centos5.1
http://www.cyberciti.biz/faq/linux-inotify-examples-to-replicate-directories/
http://linux.die.net/man/5/incrontab
http://www.lithodyne.net/docs/dhcp/dhcp-5.html

The first link above had the best idea of creating include files for the configuration. This allowed me to automate copying the dhcpd.conf file to the secondary server upon any changes.

Obviously, your IP scheme will be much different, please adjust accordingly. Also, this write-up may in-fact not apply to all configurations out there – You may consider this post just another resource for your research.

Let’s begin…

In addition to our existing RHEL 7 server running dhcpd, I have configured a second machine running the same. For now, the secondary dhcpd service is stopped.

In my case, I edited the primary server’s /etc/dhcp/dhcpd.conf to contain
include "/etc/dhcp/dhcpd.failover";
and to contain at least one pool declaration. In my case, because i was still testing things, in an existing subnet I commented out the existing range statement and added the pool statement just below with the same range and the required failover statement:
 subnet 10.20.0.0 netmask 255.255.0.0 {
    option broadcast-address 10.20.255.255;
    option routers 10.20.1.1;
    #range 10.20.20.1 10.20.22.254;
    pool {
        range 10.20.20.1 10.20.22.254;
        failover peer "dhcpfailover";
        }
    }
Again, note that at least one pool is required. I learned the hard way that without it, the dhcpd service will not start, leaving my network without a server for several minutes. If you are unsure where to put the include statement, just put it after your initial options and just before you first subnet.

You can either add a pool statement to each of your subnets at this point, or just do one for now for testing purposes. Each pool requires a failover peer ... statement for failover to actually work.

You may test your dhcp.conf file with the command dhcp -t -cf /etc/dhcp/dhcp.conf.

At this point, you can copy your primary /etc/dhcp/dhcpd.conf to your secondary server. We will ultimately script a mirroring process.  Just to re-iterate, this /etc/dhcp/dhcpd.conf contains include "/etc/dhcp/dhcpd.failover"; and one pool. This .conf file is copied identically to the secondary dhcp server.

Now, one of the most important parts is for the contents of the include files.  Each dhcp server will have a differing /etc/dhcp/dhcpd.failover file.

Create your primary server’s /etc/dhcp/dhcpd.failover to contain
# Failover specific configurations
failover peer "dhcpfailover" {
primary;
address 10.10.0.100;
port 647;
peer address 10.10.0.101;
peer port 647;
max-response-delay 60;
max-unacked-updates 10;
mclt 600;
split 128; #128 is balanced; use 255 if primary is 100% responsible until failure.
load balance max seconds 3;
}
and the secondary server’s /etc/dhcp/dhcpd.failover to contain
# Failover specific configurations
failover peer "dhcpfailover" {
secondary;
address 10.10.0.101;
port 647;
peer address 10.10.0.100;
peer port 647;
max-response-delay 60;
max-unacked-updates 10;
load balance max seconds 3;
}
obviously, where my primary DHCP is 10.10.0.100 and my secondary DHCP server IP is 10.10.0.101 ; Change yours accordingly.

You will also have to open the firewall to TCP port 647 on each server. In my case I chose to allow only from the specified IP sources.

At this point, you may start your secondary server’s dhcpd service with systemctl start dhcpd. If it starts properly without error, then it is safe to restart your primary server’s dhcpd service with systemctl restart dhcpd. You should test that it’s running properly at this point, and if not fix it promptly or reverse your changes and review and try again. You may also use commands such as journalctl -xn30 or systemctl -n30 status dhcpd to locate faults. You should also enable the dhcp service for auto-start with systemctl enable dhcpd.

In this system, your DHCP changes should only be applied to the primary server. The secondary server only exists for failover purposes.

Changes to the primary DHCP server do not by-default mirror to the secondary server, so we will automate this. For this process, we'll use root ssh; I have chosen to allow root ssh via ssh-key so that it can be automated with scripts. Of course I have firewalled my ssh ports to allow only certain certain IP ranges.

** Before proceeding, please note: a comment by "ZsZs" recommended replacing my incrond usage with a systemd built-in feature. I concur but have NOT yet tried it. Please refer to https://wiki.archlinux.org/index.php/rsync#Automated_backup_with_systemd_and_inotify for a better alternative. ...continuing...

I chose to utilize incrond to simplify this mirroring process. incrond utilizes the inotify-tools to watch a file (or directory) for changes to execute a specified command. This tool is not in the default RHEL repositories. To install it you will need the “Extra Packages for Enterprise Linux” (EPEL) which is quite easy to install.
For RHEL 7, installation is as follows:
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
rpm -i epel-release-latest-7.noarch.rpm
Afterward, install and enable incrond as follows:
yum -y install inotify-tools incron
systemctl enable incrond
First, let’s write a script to copy dhcp.conf to the secondary server and restart it’s service. Create a file /root/scripts/update-failover-server.sh to contain: (due to potential issues, use full command paths)
#!/bin/bash
/usr/bin/scp /etc/dhcp/dhcpd.conf root@10.10.0.101:/etc/dhcp/dhcpd.conf
/usr/bin/ssh root@10.10.0.101 '/usr/bin/systemctl restart dhcpd'
/usr/bin/systemctl restart incrond #CRITICAL ISSUE; one-time trigger and subsequent fail work-around
and be sure to mark it executable (chmod +x). Again, these are my IP’s, yours will vary. Most importantly, note that I have already enabled passwordless login between the servers with ssh-keys. This automation will NOT work without such. You may in fact want to to test your script’s success by running it manually first.

We can now configure a “watch” for any changes to the dhcpd.conf file. Use the command EDITOR=nano incrontab -e to edit the incron-file with syntax FILE TRIGGERLIST COMMAND [OPTION] (refer to the links referenced above):
/etc/dhcp/dhcpd.conf IN_MODIFY,IN_ATTRIB,IN_CREATE /root/scripts/update-failover-server.sh
Here, I’m trying to cover any modification to dhcpd.conf. Editors vs. Webmin modify the file differently, so this should cover both instances.

We can now start the incrond services with the command systemctl start incrond.

At this point, both servers should be running and able to serve IP addresses. You should verify such.

Now, you may test that any changes to your primary dhcpd.conf propagate to the secondary server. Go ahead and modify your primary /etc/dhcp/dhcpd.conf by your preferred method and analyze what happens.

As you find that everything is a success, you may add pool statements to each subnet while moving the range statements within the pool.

---
As Always, Good Luck! You can thank me with bitcoin.    


95% Written with StackEdit.