Archive for March, 2007

Problems with NUT

Wednesday, March 21st, 2007

NUT (Network UPS Tools) has been disappointing me pretty much since the start (two years ago) due to strange shutdowns. Today I decided to do a full analysis.

First, let me describe the configuration: I have two Compaq Proliant 3000 servers, named triumph and bmw, both running Fedora Core 5 with kernel 2.6.20-1.2300.fc5smp. I have two APC Smart-UPS, named cchSU2200 and cchSU1250. They are an APC Smart-UPS 2200 and an APC Smart-UPS 1250, respectively.

Server triumph has two power supplies, each one connected to a different UPS. Server bmw has one power supply, connected to UPS cchSU2200. Both UPSs have their signalling cables connected to serial ports on server triumph. Both UPSs have fresh batteries and “official” APC serial cables.

I won’t go into the details of my upsmon configuration (but will if you ask). Suffice to say that each server requires a minimum of one power supply and monitors all other UPSs (there is actually a third UPS several hundred miles away that is also monitored).

Here are the results and narrative of my testing:

The syslog on triumph:

Mar 22 10:28:26 My Action: Start UPS services
Mar 22 10:28:27 triumph upsd[6734]: Connected to UPS [cchSU2200]: apcsmart-ttyS0
Mar 22 10:28:27 triumph upsd[6734]: Connected to UPS [cchSU1250]: apcsmart-ttyS1
Mar 22 10:28:29 triumph upsd[6735]: Startup successful
Mar 22 10:28:29 triumph upsmon[6738]: Startup successful
Mar 22 10:28:29 triumph upsd[6735]: Connection from 127.0.0.1
Mar 22 10:28:29 triumph upsd[6735]: Client monuser@127.0.0.1 logged into UPS [cchSU2200]
Mar 22 10:28:29 triumph upsd[6735]: Connection from 127.0.0.1
Mar 22 10:28:29 triumph upsd[6735]: Client monuser@127.0.0.1 logged into UPS [cchSU1250]
Mar 22 10:28:36 triumph upsd[6735]: Connection from 192.168.1.10
Mar 22 10:28:36 triumph upsd[6735]: Client monuser@192.168.1.10 logged into UPS [cchSU2200]
Mar 22 10:28:36 triumph upsd[6735]: Connection from 192.168.1.10
Mar 22 10:29:09 My Action: Disconnect power to cchSU1250
Mar 22 10:29:10 triumph upsmon[6739]: UPS cchSU1250@localhost on battery
Mar 22 10:29:11 My Action: Connect power to cchSU1250
Mar 22 10:30:30 triumph upsmon[6739]: UPS cchSU1250@localhost on line power
Mar 22 10:30:54 My Action: Disconnect power to cchSU2200
Mar 22 10:30:55 triumph upsmon[6739]: UPS cchSU2200@localhost on battery
Mar 22 10:30:56 My Action: Connect power to cchSU2200
Mar 22 10:31:03 triumph upsmon[6739]: UPS cchSU2200@localhost on line power
Mar 22 10:34:41 My Action: Disconnect power to cchSU1250 and cchSU2200
Mar 22 10:34:42 triumph upsmon[6739]: UPS cchSU2200@localhost on battery
Mar 22 10:34:42 triumph upsmon[6739]: UPS cchSU1250@localhost on battery
Mar 22 10:34:43 My Action: Connect power to cchSU1250 and cchSU2200
Mar 22 10:35:56 triumph upsd[6735]: Client monuser@127.0.0.1 set FSD on UPS [cchSU2200]
Mar 22 10:35:56 triumph upsd[6735]: Client monuser@127.0.0.1 set FSD on UPS [cchSU1250]
Mar 22 10:36:02 triumph upsd[6735]: Host 192.168.1.10 disconnected (read failure)
Mar 22 10:36:02 triumph upsd[6735]: Host 192.168.1.10 disconnected (read failure)
Mar 22 10:36:02 triumph upsmon[6739]: Executing automatic power-fail shutdown
Mar 22 10:36:02 triumph upsmon[6739]: Auto logout and shutdown proceeding
Mar 22 10:36:07 triumph upsd[6735]: Host 127.0.0.1 disconnected (read failure)
Mar 22 10:36:07 triumph upsd[6735]: Host 127.0.0.1 disconnected (read failure)
Mar 22 10:36:07 triumph logger: upsmon.says.shutdwn

The syslog on bmw:

Mar 22 10:28:35 My Action: Start UPS services
Mar 22 10:28:36 bmw upsmon[9927]: Startup successful
Mar 22 10:29:12 bmw upsmon[9928]: UPS cchSU1250@triumph.hapgoods.com on battery
Mar 22 10:30:32 bmw upsmon[9928]: Giving up on the master for UPS [cchSU1250@triumph.hapgoods.com]
Mar 22 10:30:33 bmw upsmon[9928]: UPS cchSU1250@triumph.hapgoods.com on line power
Mar 22 10:30:58 bmw upsmon[9928]: UPS cchSU2200@triumph.hapgoods.com on battery
Mar 22 10:31:04 bmw upsmon[9928]: UPS cchSU2200@triumph.hapgoods.com on line power
Mar 22 10:34:43 bmw upsmon[9928]: UPS cchSU2200@triumph.hapgoods.com on battery
Mar 22 10:34:43 bmw upsmon[9928]: UPS cchSU1250@triumph.hapgoods.com on battery
Mar 22 10:35:57 bmw upsmon[9928]: Giving up on the master for UPS [cchSU2200@triumph.hapgoods.com]
Mar 22 10:35:57 bmw upsmon[9928]: Giving up on the master for UPS [cchSU1250@triumph.hapgoods.com]
Mar 22 10:35:57 bmw upsmon[9928]: Executing automatic power-fail shutdown
Mar 22 10:35:57 bmw upsmon[9928]: Auto logout and shutdown proceeding
Mar 22 10:36:02 bmw logger: ups.says.shutdown

And now the analysis of the results…

  • On both servers, everything starts normally. The upsmon on bmw is clearly seen to connect to triumph.
  • At 10:29:09, I pulled the power on cchSU1250 and immediately plugged it back in. Immediately, the master upsmon (on triumph) notes the on-battery condition, but I was disappointed to see the amount of time it took to recognize the return to on-line status (over a minute). On bmw, it is worth noting that upsmon “gives up” on the master for cchSU1250 (triumph) right at the end of this wait. This may or may not be a contributing factor to the hideous behavior we are going to observe in a few moments…(ominous background music starts…)
  • At 10:30:54, I pulled the power on cchSU2200 and immediately plugged it back in. The behavior on both triumph (the master) and bmw (a slave) seems perfectly normal, with timely observation of both the on-batter and on-line condition.
  • (ominous background music builds…) At 10:34:41, I pulled the power to both cchSU2200 & cchSU1250 and immediately plugged it back in to both. The master recognize the on-battery conditions, and it propagates them to the slave nicely. But the master never recognizes the immediate return to an on-line condition. The inevitable results appear about a minute later when the master starts shutting down and the slave follows suit (ominous background music reaches a crescendo, and then… silence).

I should point out that this problem has happened numerous times over the past two years. A simple 2 second power glitch will provoke the shutdown of bmw and sometimes even triumph. During that time, I have kept up-to-date with nut through the Fedora yum distribution and upgraded from Fedora Core 4 to Fedora Core 5. Currently, I am running nut version 2.0.3, release 0.1.fc5. The APC Smart protocol driver claims to be version 1.99.7 with a command table version of 2.0.

Any ideas? I ran some more auxiliary experiments and found that while upsmon seems to take a long time to recognize a return to on-line in cchSU1250, upslog reports the OL condition immediately. So I think the communication between the UPS and upsdrvctl is working.

Does anybody really know what time it is?

Monday, March 12th, 2007

Ruby’s built-in Time class is weak -it has no notion of time zones other than the OS-supplied “UTC” and “local” zone contexts. To compound matters, Rails’ independent TimeZone class is flawed, supporting an offset from UTC but no DST rules or flag. Using the standard tools is an exercise in frustration. Here is how to improve the situation:

 

 

Step One: Install the TZInfo gem. This will provide a comprehensive OS-independent time zone database that includes DST data. Apparently this data is kept pretty current, as it knows about the exceptional changes to US time zones introduced in Spring, 2007. At this point, you can associate a legitimate time zone with a user, for example (see reference one). But you still need to convert instances of Time from one timezone to another -typically from UTC to the user’s timezone. And worse, you need to track the time and the relevant timezone independently.

 

Step Two: Force all normal Ruby and Rails operations to consistently use UTC. This avoids the situation where the flawed Ruby and Rails standard libraries attempt to “help out” and end up mucking up the works. In environment.rb:

 

 

 

 

require_gem ‘tzinfo’

include TZInfo

ActiveRecord::Base.default_timezone = :utc

ENV[’TZ’] = ‘UTC’

 

 

 

 

 

 

 

 

Unfortunately, setting TZ doesn’t seem to work reliably under Win32. Behavior is very sensitive to where and when the variable is set.

Step Three: Overwrite the Rails TimeZone class with an improved one that uses the gem from step one by installing the tzinfo_timezone plugin. Now the standard time zone methods such as now() will use the DST-aware TimeZone data.

Before:

  • TimeZone[’Eastern Time (US & Canada)’].now => Mon Mar 12 07:15:17 +0000 2007
  • TimeZone[’Eastern Time (US & Canada)’].now => Mon Mar 12 08:18:52 UTC 2007

After:

  • Timezone.get(’America/New_York’).now => Mon Mar 12 08:18:52 UTC 2007

In late March 2007, the “Before” result is incorrect as DST is in effect. The “After” section shows how the plugin has made the standard Rails TimeZone heel to DST.Note that all results appear to be in UTC because the return value (an instance of Time) is incapable of storing times in arbitrary timezones.The best solution seems to be to always claim the time is in UTC. I’ll emphasize this point because it is so counter-intuitive: the tzinfo_timezone plugin and other similar plugins generally ignore the Ruby Time class’ time zone on input and output inconsistent Time instances (where the time is in a timezone x but the output shows UTC) -this is normal in the perverse world of Ruby time.

 

 

 

 

Strangely, TzinfoTimezone will only accept identifying strings from the standard Rails time zones (e.g. Eastern Time (US & Canada)); there is no support for the native TzinfoTimezone identifiers (e.g. America/New_York). Internally, it clearly uses the TzinfoTimezone class to store the timezone, but the lookup method uses the standard Rails table and its naming convention. I’ve requested relief from the author, but in the meantime, a patched version is available from my svn repository: svn://cho.hapgoods.com/tzinfo_timezone

 

Step Four: Install the tztime plugin.This adds support for time values that are aware of their frame of reference (a TZInfo time zone).Better yet, it quacks like a Time value and can be used anywhere a time value would be used.Consequently, you no longer need to carry time zone values around right up to the point of generating output. The results speak for themselves:

Before:

  • Timezone.get(’America/New_York’).now => Mon Mar 12 08:18:52 UTC 2007

After:

  • <set the “local” timezone>
  • TzTime.now => 2007-03-12 08:18:52 EDT

 

 

 

Ignoring the formatting of times, the new TzTime type clearly knows what is going on. It is aware of its time zone to the point we can do the following:

 

 

TzTime.now.dst? => true

TzTime.now.utc.dst? => false

 

 

Setting the local timezone is important to the usage of TzTime to ensure times are output with the proper timezone.Without a reference time zone, TzTime will generate errors. Reference three (the definitive reference for this plugin) shows how to set the local timezone in an around filter. Here are some examples:

 

 

TzTime.zone = TimeZone[’Eastern Time (US & Canada)’]

TzTime.zone = current_user.tz

 

 

 

 

Thanks to Jamis Buck for writing these two plugins. With his initiative, Rails moves one step forward towards being an ideal platform for internationalized applications.

 

 

References:

  1. http://marklunds.com/articles/one/311
  1. http://tzinfo.rubyforge.org/doc/
  1. http://weblog.jamisbuck.org/2007/2/2/introducing-tztime

 

A bit more spherical trigonomerty

Wednesday, March 7th, 2007

My bluff has been called.  After writing on my use of spherical trigonometry to determine the distance between two points on the earth’s surface, I now find myself needing to work backwards.  That is to say, given a point P and a distance d, find another point at a distance of r from P.  The application of this is to determine the coordinates of a bounding box around a circle of radius d from our point of interest, P.

Step one is to calculate the angular distance, α, for a given linear distance.  This is easily determined in planar trigonometry:

    α = 2Ï€d/R

where R is the radius of the circle.  By being tricky and only moving in a north-south direction (or east-west direction), this result can apply to spherical trigonometry as well, with R being the radius of the earth (see my previous post for how to get a good approximation of that distance).

The bounding box is defined by its north-west and south-east corners.  Here they are:

    PNW = Plat + α, Plon + α
    PSE = Plat - α, Plon - α

And that is how we calculate the approximate size of a bounding box containing a circle of radius d around a point P.  Keep in mind that the coordinates from the above formulae are in radians!

Native NSIS installer builds under Linux -a HOWTO

Saturday, March 3rd, 2007

Today I decided to create a build system for NSIS under Linux. Here is how I did it.

 

Download and unzip latest NSIS source from sourceforge and the corresponding zip file with libraries and such:

 

$ mkdir nsis;cd nsis
$ wget http://prdownloads.sourceforge.net/nsis/nsis-2.24-src.tar.bz2?download
$ wget http://prdownloads.sourceforge.net/nsis/nsis-2.24.zip?download
$ unzip nsis-2.24.zip
$ bunzip2 nsis-2.24-src.tar.bz2
$ tar -xf nsis-2.24-src.tar

To build makensis, the NSIS installer generator, you need scons (a replacement for make). You may be able to install it with your platforms package manager -but I couldn’t. The Fedora Core 5 yum package for was not quite up-to-date enough (0.96.1 available via yum versus 0.96.93 required). Here is how I manually installed it:

$ mkdir scons;cd scons
$ wget http://prdownloads.sourceforge.net/scons/scons-0.96.95-1.noarch.rpm
$ sudo rpm -ivh scons-0.96.95-1.noarch.rpm

The rpm is not signed, so you can’t use the yum localinstall <package> trick.

Now use scons to build makensis (but none of the other components -we’ll be using the pre-built ones since they don’t need to run under Linux anyway):

$ cd ../nsis-2.24-src
$ scons SKIPSTUBS=all SKIPPLUGINS=all SKIPUTILS=all SKIPMISC=all NSIS_CONFIG_CONST_DATA_PATH=no

That last step should take a bit. When it finishes, you have a native Linux
version of makensis in ./build/release/makensis/makensis.

Now for the tricky part… NSIS installers are built with makensis. This output of makensis includes (in compressed form) many libraries that are used at install time. These libraries could be built as part of our build process -in fact the default scons configuration tries to do just that. But without a really good cross-compiler, we can’t do better than just taking the pre-built libraries from the same NSIS version as built for Win32. That is why we downloaded the zip file in step one. Now we need to meld the Linux version of makensis with the zip file into a coherent whole:

$ sudo scons install-compiler
$ cd ..
$ sudo unzip nsis-2.24 -d /usr/local/share
$ sudo mv /usr/local/share/nsis-2.24/ /usr/local/share/nsis

The first step puts makensis into the right location (/usr/local/bin on my system). The remaining steps put the contents of the zip file in a standard location where its libraries can be found by makensis.  Done!

 

NB: Ideally, you should be able to install the makensis executable and support libraries/configs from the zip to appropriate locations on the Linux filesystem using scons install, but this is not possible. Admittedly, that is asking a lot given that the manual process shown above is so simple. Still, it is ironic that a program whose sole focus is installers has a relatively limited installer itself (under Linux only). I posted on this subject on the Nullsoft forum. Kudos to kichik over there for confirming the problem as well as helping me get through this last step. Not to mention posting other stuff on the forum that got me through the first steps as well.

Regardless, it is amazing that the whole thing works under Linux. My build system is now completely hands off (tied into subversion via CruiseControl.rb). All I do is check in a new revision, and a couple of minutes later I have a new installer that is automatically put into the ftp repository.