Archive for November, 2005

Proliant NICs

Monday, November 21st, 2005

I did a little snooping to determine which NICs were installed in my ProLiants.  It looks like I have one of each conceivable version for the PL3000!

Device

Vendor

Driver

MAC Prefix

Server

9200 3c905

10b7 3Com

3c59x

00:50:DA 3Com

suzuki

ae32 Netelligent 10/100 TX

0e11 Compaq

tlan

00:08:C7 Compaq

bmw

1229 Ethernet Pro 100

8086 Intel

e100

00:50:8B Compaq

triumph

Useful commands:

lspci

ethtool

known to fail on tlan driver

mii-tool

Another Big Difference

Monday, November 21st, 2005

I found another big difference between the different versions of the Proliants that I have:

6/333 -No onboard Advanced System Management (ASM) chip

amrecover

Friday, November 18th, 2005

I have learned (too) much about amrecover in the past day or so.  My adventure started when I decided to test restores from amanda -after all, what good is backup without the ability to restore?  The problem started right from the get-go:

Screen clipping taken: 18/11/2005, 15:07

Silent failure.  I hate silent failures.  I spent the better part of two hours trying command line variations in a futile attempt to get amrecover to hint at the problem.  I then started trying some more systematic troubleshooting steps.

Debug amrecover, Step 1: First, I examined the amanda log files, which are usually pretty good.  You first need to find the proper log file, which requires sorting the log files by time and getting the latest relevant log file (ls -lrt).  In this case, we are looking into the amrecover utility and so the log files (in /var/log/amanda) are named amrecover ..debug:

Screen clipping taken: 18/11/2005, 15:21

As expected we see that amrecover is trying to contact the server on triumph.  I learned that amrecover trys to open a TCP connection to the amanda index daemon on port 10082.

Debug amrecover, Step 2: OK, I’ll telnet to that port and see if it is listening:

Screen clipping taken: 18/11/2005, 15:09

Interesting.  I was making it through the three (!) firewalls to triumph, who was promptly shutting the door in my face.  Sounds like a server problem…

Debug amrecover, Step 3: Over on the server, double check if anybody is listening on port 10082:

Screen clipping taken: 18/11/2005, 15:24

Yup, xinetd is listening.  But is it starting the amanda index program correctly?

Debug amrecover, Step 4: Check the contents of the xinetd configuration file (/etc/xinetd.d/amandaidx):

Screen clipping taken: 18/11/2005, 15:26

All looks good here.

Debug amrecover, Step 5: Next step, check out the amanda log files on the server for amindexd:

Screen clipping taken: 18/11/2005, 15:30

While not obviously a fatal error, this “Socket operation on non-socket” does not look good.  Plus, I can see that amindexd ran for only about a second.  Now I am starting to smell smoke.  In more detail, it seems that amindexd called the getpeername function, which failed.  Reading the man page for getpeername shows that it takes a socket for an argument.  Is amindexd passing getpeername a bad argument?  At this point, I hit a brick wall -it would seem that I would need to start amindexd in a debugger -but that is not an option.

Out of the blue, I decide to look at the audit logs for selinux.  Bingo!  For the conclusion of this exercise, see the selinux section of this site.

Tweaking amanda

Thursday, November 17th, 2005

Today I turned on the autoflush parameter.  Setting the autoflush option causes amanda to flush any previously dumped-but-not-taped images to the tape.  Otherwise, the amflush program must be run manually.  This is important if you regularly don’t have the proper tape loaded when amdump runs.

NB: use of this parameter can cause data so old as to be worthless to be taped.

I also set the reserve parameter to 40 (percent).  Now if I forget to change a tape one day, 60% of the holding disk space (0.60 * 10GB = 6GB) can be used for level 0 dumps.  That should allow at least one tape to be missed with no degradation.  After the 6GB is filled with level 0 dumps, the remaining 4GB will be reserved exclusively for level 1 (or greater?) dumps.

To accommodate the very long backup times to remotely backup suzuki, I have decided to start the backup at 22:00 instead of 02:00.  I also want the tape to eject when the backup job finishes; this will provide a visual indication of when the job has finished and it will save time when changing tapes.  Also, since the autoflush and reserve parameters are set, I should attempt a backup every day of the week even if I intend to change tapes only five days a week.  At least I will have images on the holding disk from every day.

Now amanda’s crontab entry looks like this:

Screen clipping taken: 17/11/2005, 15:12

Investigations of TCP window size with iperf

Tuesday, November 15th, 2005

# wget http://ftp.belnet.be/packages/dries.ulyssis.org/fedora/fc4/i386/RPMS.dries/iperf-2.0.2-1.2.fc4.rf.i386.rpm

# rpm -iv iperf-2.0.2-1.2.fc4.rf.i386.rpm

This will download and install an FC4-specific version of the iperf tool.  You can also download an earlier version, but you will probably need the compat-libstdc++-33 compatibility libraries.  Once installed, you’ll probably note that there is no man page.  Instead, point your browser here:

http://dast.nlanr.net/Projects/Iperf/iperfdocs_1.7.0.html

Perhaps the most valuable experiment is running iperf on the remote end (suzuki) as a server as follows:

[root@suzuki ~]# iperf -s -m

And on the near end as a client as follows:

[root@triumph ~]# iperf -c suzuki -m -w 16K

and experimenting with the window size while simultaneously watching the output on iptraf.

The general trends I am seeing are that TCP connections perform best at somewhere between 64K and 128K windows.  Amazingly, iptraf shows the “Packet Size” and being only 52 bytes whenever the window size is larger than about 1K.  I am suspicious of those packet sizes because the performance with larger windows still seems more than adequate.

NB:Linux interprets the iperf window size argument as being twice the size given on the command line.  Perhaps the difference between words and bytes?  At least iperf reports the discrepancy.  I also notice that the effective window size used by a connection can only be an (approximate) power of two.  And the minimum window size is 2K.

NB: iperf running as a daemon (iperf -D) results in terrible performance and interminable test sessions.  Not sure what is going on here…

Astonishingly, iperf with a buffer of 32K results in up to 93Mbits/sec performance on the local LAN (switched 100 Mbits/sec).  This is so close to the hardware limits as to be considered “perfect!”

Also, beware interactions with iptables!  When the server’s port (5001) is blocked, iperf will report a problem with path MTU discovery.

NB: The maximum IP payload that can be carried appears to be 1416 bytes.  This can be confirmed with ping as follows:

Screen clipping taken: 15/11/2005, 14:17

Here is a means of capturing the initial data exchange between the client and the host of an ftp session:

tcpdump -

amanda tuning

Monday, November 14th, 2005

After fourteen hours, I calculated that the amanda server on triumph was backing up the client suzuki at a rate of 39 kB/s.  This is well below the rate I would have expected -but tests with FTP show similar performance through the tunnel.  At this rate, backing up a 5GB filesystem will take about a day and a half.

Some investigation using iptraf (http://iptraf.seul.org/2.7/manual.html) on triumph shows the following:

Screen clipping taken: 15/11/2005, 08:24

Note that the relevant Window Size is reported by the receiver when he ACKs the sender’s transmissions.  So in the IPTraf display above, the “192.168.1.30″ line (triumph) is the important one for window size.

The transmissions by suzuki are 1436 bytes long feeding into a window of 32K.  The path MTU discovery seems to be working as well!

Incidentally, iptraf seems to get confused on suzuki (which, recall, is also the IPSec tunnel endpoint) and shows zeroes for both the average packet size and the window size transmitted by suzuki itself:

Screen clipping taken: 15/11/2005, 08:39

I ran some tests with larger window sizes using the iperf utility.  The highest bandwidth iperf was able to extract (suzuki->triumph) was only about 42 kilobytes/second.  And that was with a window size of 32K.  I tried larger windows and the performance boost was negligible.

Interestingly, the bandwidth in the opposite direction (triumph->suzuki, or cch->meh) was larger.  I was able to get up to 60kB/s using iperf and ftp through the tunnel.

I even ran a test with FTP and bypassing the tunnel and using the cleartext transmission path with PAT on Matt’s Linksys firewall.  I downloaded a large file from triumph to suzuki.  The results?  60 kB/sec.

The bottleneck does not seem to be either the TCP Window size or the IPSec tunnel -it is simply a limitation in the available bandwidth from Matt’s DSL connection.

Update (09/02/2006, 18:27): Perhaps QoS could address the problem.  Linux does have some capabilities in this area.

References:

http://www.faqs.org/docs/Linux-HOWTO/ADSL-Bandwidth-Management-HOWTO.html#AEN115

NTP Client on Linux

Thursday, November 10th, 2005

For the newest server (bmw), I decided to simply set it up as an NTP client. The /etc/ntp.conf file contains the following entry:

broadcastclient

This entry was apparently created automatically during the initial setup by the dhclient-script.

amanda remote client backup breakthrough

Wednesday, November 9th, 2005

I had a big breakthrough today…but first some background.  As noted above, I tried to backup the “more local” host bmw from triumph (the tape server) and it worked perfectly.  So there was something unique to suzuki that was causing the backups to fail.  Note that the failure was announced in the amanda email messages as “too many dumper retry” and this

Screen clipping taken: 09/11/2005, 15:32

in the summary section of the email.

I learned today that the backup will work if iptables on suzuki is disabled.  My test was only for the /etc filesystem, but it was a level 0 dump (51MB) that took 22 minutes.

I found a thread on the amanda mailing list that points the finger squarely at the ip_conntrack_amanda helper module.  Apparently there were some early versions of the 2.6 kernel that had buggy conntrack modules.  I am now running version kernel 2.6.13, but the problem is apparently still around.  Red Hat Bugzilla (152036, 142745, 143803) also implies this problem was fixed but my symptoms are identical to the problems reported there.

Here are the steps to (temporarily) work around the ip_conntrack_amanda problem:

# rmmod ip_conntrack_amanda

# iptables -I INPUT -p tcp -s $TAPE_SERVER -j ACCEPT

With these changes, backups of suzuki worked.  So I concluded that the problem was still present in FC4 (kernel 2.6.13-1.1532_FC4smp).  So I opened another Bugzilla case (172845).

In addition, I learned that on the tape server, I should open the TCP ports for amindexd (10082)and amidxtaped (10083).

References:

http://groups.yahoo.com/group/amanda-users/message/55850?viscount=100

http://www.amanda.org/docs/portusage.html

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=172845

Eject tape drive when amanda completes

Wednesday, November 9th, 2005

I found some guidance here:

http://bio3d.colorado.edu/tor/sadocs/filesys/amanda.html#use%20the%20′amcheck’%20command%20to%20verify%20a%20configuration

Still having problems with a remote amanda client

Wednesday, November 2nd, 2005

The above steps weren’t good enough.  To enable faster troubleshooting, and to eliminate the VPN tunnel as a source of the problem, I configured triumph to backup bmw as well -not just suzuki.