Wednesday, October 28, 2015

DHCP Failover on RHEL 7

enter image description here

As always, i am not the authority on this subject; however, I have successfully added “failover” to our existing DHCP server in which the OS had been replaced several times while simply copying the dhcpd.conf over each time.

Configuring a failover DHCP is essentially not difficult. However, if you are in an “Enterprise” or “Corporate” environment (i.e. multiple subnets), then your router will require an additional “ip-helper” for each subnet. You or your network engineer will need to perform this task for the following system to work. In our case we simply added a secondary ip helper-address <IP> to each subnet (VLAN) in our hardware router.

Prerequisites: 0) properly configured router. 1) dhcpd running and configured properly. 2) EPEL repo installed. 3) ssh-key passwordless logins configured between the two DHCP servers. 4) Time is synchronized on servers (via ntpd or vm-tools’ options)

I reviewed the following sources for this process:
http://blog.whatgeek.com.pt/2012/03/dhcp-failover-load-balancing-and-synchronization-centos-6/
https://kb.isc.org/article/AA-00502/0/A-Basic-Guide-to-Configuring-DHCP-Failover.html
https://www.howtoforge.com/how-to-set-up-dhcp-failover-on-centos5.1
http://www.cyberciti.biz/faq/linux-inotify-examples-to-replicate-directories/
http://linux.die.net/man/5/incrontab
http://www.lithodyne.net/docs/dhcp/dhcp-5.html

The first link above had the best idea of creating include files for the configuration. This allowed me to automate copying the dhcpd.conf file to the secondary server upon any changes.

Obviously, your IP scheme will be much different, please adjust accordingly. Also, this write-up may in-fact not apply to all configurations out there – You may consider this post just another resource for your research.

Let’s begin…

In addition to our existing RHEL 7 server running dhcpd, I have configured a second machine running the same. For now, the secondary dhcpd service is stopped.

In my case, I edited the primary server’s /etc/dhcp/dhcpd.conf to contain
include "/etc/dhcp/dhcpd.failover";
and to contain at least one pool declaration. In my case, because i was still testing things, in an existing subnet I commented out the existing range statement and added the pool statement just below with the same range and the required failover statement:
 subnet 10.20.0.0 netmask 255.255.0.0 {
    option broadcast-address 10.20.255.255;
    option routers 10.20.1.1;
    #range 10.20.20.1 10.20.22.254;
    pool {
        range 10.20.20.1 10.20.22.254;
        failover peer "dhcpfailover";
        }
    }
Again, note that at least one pool is required. I learned the hard way that without it, the dhcpd service will not start, leaving my network without a server for several minutes. If you are unsure where to put the include statement, just put it after your initial options and just before you first subnet.

You can either add a pool statement to each of your subnets at this point, or just do one for now for testing purposes. Each pool requires a failover peer ... statement for failover to actually work.

You may test your dhcp.conf file with the command dhcp -t -cf /etc/dhcp/dhcp.conf.

At this point, you can copy your primary /etc/dhcp/dhcpd.conf to your secondary server. We will ultimately script a mirroring process.  Just to re-iterate, this /etc/dhcp/dhcpd.conf contains include "/etc/dhcp/dhcpd.failover"; and one pool. This .conf file is copied identically to the secondary dhcp server.

Now, one of the most important parts is for the contents of the include files.  Each dhcp server will have a differing /etc/dhcp/dhcpd.failover file.

Create your primary server’s /etc/dhcp/dhcpd.failover to contain
# Failover specific configurations
failover peer "dhcpfailover" {
primary;
address 10.10.0.100;
port 647;
peer address 10.10.0.101;
peer port 647;
max-response-delay 60;
max-unacked-updates 10;
mclt 600;
split 128; #128 is balanced; use 255 if primary is 100% responsible until failure.
load balance max seconds 3;
}
and the secondary server’s /etc/dhcp/dhcpd.failover to contain
# Failover specific configurations
failover peer "dhcpfailover" {
secondary;
address 10.10.0.101;
port 647;
peer address 10.10.0.100;
peer port 647;
max-response-delay 60;
max-unacked-updates 10;
load balance max seconds 3;
}
obviously, where my primary DHCP is 10.10.0.100 and my secondary DHCP server IP is 10.10.0.101 ; Change yours accordingly.

You will also have to open the firewall to TCP port 647 on each server. In my case I chose to allow only from the specified IP sources.

At this point, you may start your secondary server’s dhcpd service with systemctl start dhcpd. If it starts properly without error, then it is safe to restart your primary server’s dhcpd service with systemctl restart dhcpd. You should test that it’s running properly at this point, and if not fix it promptly or reverse your changes and review and try again. You may also use commands such as journalctl -xn30 or systemctl -n30 status dhcpd to locate faults. You should also enable the dhcp service for auto-start with systemctl enable dhcpd.

In this system, your DHCP changes should only be applied to the primary server. The secondary server only exists for failover purposes.

Changes to the primary DHCP server do not by-default mirror to the secondary server, so we will automate this. For this process, we'll use root ssh; I have chosen to allow root ssh via ssh-key so that it can be automated with scripts. Of course I have firewalled my ssh ports to allow only certain certain IP ranges.

** Before proceeding, please note: a comment by "ZsZs" recommended replacing my incrond usage with a systemd built-in feature. I concur but have NOT yet tried it. Please refer to https://wiki.archlinux.org/index.php/rsync#Automated_backup_with_systemd_and_inotify for a better alternative. ...continuing...

I chose to utilize incrond to simplify this mirroring process. incrond utilizes the inotify-tools to watch a file (or directory) for changes to execute a specified command. This tool is not in the default RHEL repositories. To install it you will need the “Extra Packages for Enterprise Linux” (EPEL) which is quite easy to install.
For RHEL 7, installation is as follows:
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
rpm -i epel-release-latest-7.noarch.rpm
Afterward, install and enable incrond as follows:
yum -y install inotify-tools incron
systemctl enable incrond
First, let’s write a script to copy dhcp.conf to the secondary server and restart it’s service. Create a file /root/scripts/update-failover-server.sh to contain: (due to potential issues, use full command paths)
#!/bin/bash
/usr/bin/scp /etc/dhcp/dhcpd.conf root@10.10.0.101:/etc/dhcp/dhcpd.conf
/usr/bin/ssh root@10.10.0.101 '/usr/bin/systemctl restart dhcpd'
/usr/bin/systemctl restart incrond #CRITICAL ISSUE; one-time trigger and subsequent fail work-around
and be sure to mark it executable (chmod +x). Again, these are my IP’s, yours will vary. Most importantly, note that I have already enabled passwordless login between the servers with ssh-keys. This automation will NOT work without such. You may in fact want to to test your script’s success by running it manually first.

We can now configure a “watch” for any changes to the dhcpd.conf file. Use the command EDITOR=nano incrontab -e to edit the incron-file with syntax FILE TRIGGERLIST COMMAND [OPTION] (refer to the links referenced above):
/etc/dhcp/dhcpd.conf IN_MODIFY,IN_ATTRIB,IN_CREATE /root/scripts/update-failover-server.sh
Here, I’m trying to cover any modification to dhcpd.conf. Editors vs. Webmin modify the file differently, so this should cover both instances.

We can now start the incrond services with the command systemctl start incrond.

At this point, both servers should be running and able to serve IP addresses. You should verify such.

Now, you may test that any changes to your primary dhcpd.conf propagate to the secondary server. Go ahead and modify your primary /etc/dhcp/dhcpd.conf by your preferred method and analyze what happens.

As you find that everything is a success, you may add pool statements to each subnet while moving the range statements within the pool.

---
As Always, Good Luck! You can thank me with bitcoin.    


95% Written with StackEdit.

7 comments:

  1. Hi, Thanks for this tutorial.
    In debian 8 I am still facing with the strange incrond issue, where it is being triggered only once.
    Alternatively to incrond one could use the systemd's built in feature described here:
    https://wiki.archlinux.org/index.php/rsync#Automated_backup_with_systemd_and_inotify
    This works flawlessly.

    ReplyDelete
    Replies
    1. I agree, this incrond trigger-once issue is a hassle. Thus my only solution was to always restart the service within the update-script.

      Thank you very much for the alternative! I will try this on my next server iteration.

      Delete
  2. This is GREAT information. Thank you!
    I built and configured DHCPd with a primary and secondary dhcpd.conf files and included a dhcpd.master file that holds all the subnets. I have puppet control the files and direct the file to proper servers.

    I have a question with DHCPd Failover and PXE Boot -- is it supported? Does it work properly? If so, how can I properly set it up?

    Can allow booting; and allow bootp; work with DHCPd Failover? Should it be in the primary/secondary dhcpd.conf files or the dhcpd.master file?

    Also should next-server x.x.x.x; and filename "xxx"; be in the primary/secondary dhcpd.conf files or the dhcpd.master file?

    Thanks again!

    ReplyDelete
    Replies
    1. We have "server-name", "next-server", and "filename" in our dhcpd.conf (which is mirrored on both) and PXE is working. However; Technically, I never downed our primary to verify it works from both, but I expect it should.

      Delete
    2. So for me to better understand what you did -- you have your setup using dhcpd.conf to have the failover configs applied (a little different on server depending on primary or secondary) and the dhcpd.failover file is on both and configured to have all your subnets that you want to failover?

      What is "server-name" used for?

      Did you use deny dynamic bootp clients in your dhcpd.failover file for subnets that you dont want BOOTP enabled on?

      Thanks for your time!

      Delete
    3. Firstly, I should comment I like your approach also. However, there are settings in the /etc/dhcp/dhcpd.conf that we have edited. As such, with my approach, it gets mirrored onto the secondary automatically.

      As you can see in my post, both /etc/dhcp/dhcpd.conf are identical. Both have include "/etc/dhcp/dhcpd.failover"; Each server will have a different /etc/dhcp/dhcpd.failover file. These two differing files are small and remain static on each server. The dhcpd.failover on the primary is 13 lines. The dhcpd.failover on the secondary is 11 lines. All my subnets are in the /etc/dhcp/dhcpd.conf . This main file gets automatically copied to the secondary via script upon any changes. What makes it work is that the two dhcpd.failover files are different.

      The part you have to troubleshoot the most is the triggering of the script. icrond has a bug that causes the service to fail after one trigger. A previous comment by ZsZs used systemd to be a trigger. I intend to use this on my next server iteration.

      server-name is no big deal and probably not needed. From the "man dhcpd.conf" page: The server-name statement can be used to inform the client of the name of the server from which it is booting. Name should be the name that will be provided to the client.

      Delete
    4. I believe Im on the same page as you are now -- I just named my files differently, obviously. There are settings within my dhcpd.master file that I have edited and it is automatically copied over to my secondary server with a different method than yours.

      Your dhcpd.conf file is similar to my dhcpd.master file which is where I store my subnets for DHCP. It is automatically copied over to secondary via puppet instead of icrond.

      Your dhcpd.failover file is similar to my dhcpd.conf file where we configured which server is primary and secondary.
      - My primary is 19 lines and seconday is 17 lines unless you add OMAPI info on both. Both have include "/etc/dhcp/dhcpd.master"; too. I however dont know if I have too many flags inside mine. I assume you dont have line 2-6 below within your files? What do you think?

      authoritative;
      ddns-update-style none;
      deny client-updates;
      one-lease-per-client true;
      allow booting;
      allow bootp;
      omapi-port 7911;
      failover peer "dhcp-partner" {
      primary;
      address Server 1
      port 647;
      peer address Server 2
      peer port 647;
      max-response-delay 60;
      max-unacked-updates 10;
      mclt 3600;
      split 255;
      load balance max seconds 3; }
      include "/etc/dhcp/dhcpd.master";


      So my initial question was does PXE boot work with DHCPd Failover and from this post it seems like it does. In the past I knew it wasnt supported.

      I also asked what file would the following would go inside and it looks like you put it inside your dhcpd.conf which for me is dhcpd.master
      allow booting;
      allow bootp;
      next-server x.x.x.x;
      filename "xxx";

      server-name is interesting and I will look into that flag.

      Thanks and I do appreciate your feedback, time and your post!

      Delete

Comments, Suggestions or "Thank you's" Invited!