IPMP Link-based Only Failure Detection with Solaris 10

I have done a fair bit of research on this topic.  While most sites will show you how to set up fail over on solris 10 with multiple ip, and probe-based failure detection.  Those two things can cause problems.  First off, most often ip space is at a premium and you want to be able to use as few a ip as possible.  So using one ip instead of three is highly appreciated.  Second having to do probe-based failure detection aka pinging the router or another host on the subnet can cause problems if you have many hosts in a data center set up in a similar fashion.  While mpathd can ping a host besides the router, the default behavior is for mpathd to contently be pinging the router making sure it is still there, and if it ceases to be there any more then it will fail over.  This can be as often as once a second.  Also multiple subnets can be served by the same router multiplying the affects.  Router’s cpu are not very fast, while they might be fast at routing, this is because the routing is accomplished through the hardware, which is carried out through a diffrent CPU.

Just to state the obvious, this is best designed for up time of service, not for aggregate speed because it is a master slave set up not master master set up.  You want to make sure that your two interfaces are in different cards if at all possible and connected to two separate network switches.

Lets start with the config file.

# cat /etc/default/mpathd

#pragma ident   “@(#)mpathd.dfl 1.2     00/07/17 SMI”

# Time taken by mpathd to detect a NIC failure in ms. The minimum time
# that can be specified is 100 ms.

#FAILURE_DETECTION_TIME=2500
FAILURE_DETECTION_TIME=10000

# Failback is enabled by default. To disable failback turn off this option

FAILBACK=yes

# By default only interfaces configured as part of multipathing groups
# are tracked. Turn off this option to track all network interfaces
# on the system

TRACK_INTERFACES_ONLY_WITH_GROUPS=yes

It is pretty self explanatory with the comments that are present.  It is really nice because the only other changes that have to be made are to the /etc/hostname.ce* files.

bash-3.00# cat /etc/hostname.ce0
10.36.133.113 netmask + broadcast + group mainint up
bash-3.00# cat /etc/hostname.ce4
group mainint up

if will look like the fowling

# ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
ce0: flags=1000843 mtu 1500 index 8
inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
groupname mainint
ether 0:14:4f:d6:f7:b8
ce4: flags=1000843 mtu 1500 index 9
inet 10.36.133.113 netmask ffffff00 broadcast 10.36.133.255
groupname mainint
ether 0:14:4f:4a:d5:a3

You can also fail over the interface by hand by issusing the /usr/sbin/if_mpadm command.

bash-3.00# /usr/sbin/if_mpadm  -d ce0
Feb 13 14:47:31 server in.mpathd[185]: Successfully failed over from NIC ce0 to NIC ce4

bash-3.00# ifconfig -a
lo0: flags=2001000849 mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
ce0: flags=89000842 mtu 0 index 8
inet 0.0.0.0 netmask 0
groupname mainint
ether 0:14:4f:d6:f7:b8
ce4: flags=1000843 mtu 1500 index 9
inet 10.36.133.113 netmask ffffff00 broadcast 10.36.133.255
groupname mainint
ether 0:14:4f:4a:d5:a3
ce4:1: flags=1000843 mtu 1500 index 9
inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255

If you ever want to make any changes, like switch which interface is primary, then it could be helpful to Wittie a quick script to help you out.  Don’t forget to make it executable by issuing the command chmod u+x script

#!/usr/bin/bash
ifconfig=/usr/sbin/ifconfig

$ifconfig ce0 unplumb
$ifconfig ce4 unplumb

Make sure that you have unique unique MAC address on the server.  If you do not have unique MAC address it can confuse the switch.

to check that a unique mac is set you can “ifconfig -a” and look at the MAC addres or

bash-3.00# eeprom |grep mac
local-mac-address?=true

If it comes back false then you can fix it by issusing the fowling command

eeprom "local-mac-address?"=true

Some useful inks with further reading.

You might have to be logged into sun solve for the first link to work.
http://sunsolve.sun.com/search/document.do?assetkey=1-61-228885-1
http://docs.sun.com/app/docs/doc/816-0211/6m6nc66s8?a=view