Early this morning, a fire broke out in the garage. The exact cause has not been identified, but it was apparently something in the vicinity of the computers. Fortunately my friend was still awake at the time, and discovered the fire long before the smoke detector in the house went off. He was able to rouse his wife and daughter, and they and the dogs all got out safely.
The garage is still standing, but is seriously damaged, as is part of the house. They will have to live elsewhere for about four months.
My server machine is now in a large plastic garbage bag in the trunk of my car. The front plastics of the case are basically completely burned away. I'll try to get some photos online soon, although obviously at the moment I can't put them on my own site.
Tonight a different friend and I will disassemble the machine and try to determine whether the hard drive can be read. Through my own complacency about the reliability of the system (a euphemism for "stupidity"), I do not have a recent backup. I am a little bit concerned that attempting to spin up the drive might cause further damage making it that much harder to recover the data. On the other hand, the drive was probably still operating despite intense heat until either the power failed or it developed mechanical problems, so maybe trying to spin it up now can't make it that much worse.
If we can't read the drive ourselves, I'll want to send it to a data recovery company. I know this will cost a lot of money. My main concern will be to find a company that is truly expert, rather than bozos that can only do the same sorts of simple-minded recovery efforts I can do myself. And ideally I'd like to find a company that specifically has experience with drives that have been through fires. If anyone can offer recomendations, please let me know.
A few months ago, I tried to replace the single 160 GiB drive with a 3ware 7500-4 ATA RAID controller and four Maxtor 200GiB 7200 RPM drives. I've been very happy with 3ware controllers on other systems, but I wasn't able to get it to work in this system. I didn't really resolve why it didn't work, but I concluded that this particular motherboard, which had served me faithfully for quite a while, had finally passed its best-when-used-by date. I wasn't too concerned, as I expected to replace the motherboard and CPU in the not-too-distant future, probably with a dual Athlon motherboard such as an Asus A7M266D. The server doesn't really need a dual Athlon, but I've been very happy with the A7M266D in my desktop system at home, and it's one of the few Athlon motherboards that supports ECC and 64-bit PCI.
Anyhow, as things have turned out, I'm glad that I didn't put the RAID in the box.
More recently, I assembled a new computer to use for editing the commercials from MPEG-2 program streams recorded by my ReplayTV 4080. Unfortunately I was not able to find any MPEG-2 editing software for Linux, so this machine was intended to run Windows 2000 and the Womble editor. (Maybe there's some small chance of Womble running with Wine, but I haven't tried it yet.) The machine has an Asus A7V8X motherboard and Athlon XP 2500+ CPU.
In order to come up with a replacement server (and more immediately, an interim machine for data recovery), I removed the disk from the video editing system, installed one of the 200GiB Maxtor drives originally intended for the RAID, and installed Red Hat 9.
I loaded up the new system, an old LCD monitor, USB keyboard and trackball, power cords, a spare Ultra ATA cable, etc. into the car and headed to my friend Steve's house. Steve had graciously aggreed to help out with the recovery effort.
The old server's case was the type with an inverted U shaped metal top rather than separate left and right side panels. A keyboard had been left on top of the machine, and was melted to the point that it was unrecognizable. It was just a big lump of blackened plastic stuck to the top and dripping down the sides of the case. And all the plastics on the front of the case, the CD-ROM drive, and the bezel fan had also melted and run. Steve removed the screws and we pried the case open with a screwdriver. Some drive power cables had been in contact with the side of the case and the insulation had melted and attached itself to the side, but we pried that loose.
The inside of the computer was bad, but perhaps not as bad as I'd feared. Everything was scorched and covered with soot. But nothing inside the case had actually caught fire. Painted surfaces were bubbled. Stuff toward the bottom of the case fared better than stuff near the top, probably partly due to heat rising, and partly due to the top of the case having been covered with the burning keyboard.
Naturally, the hard drive was near the top of the case. The top of the hard drive (the label side) was blackened, and the drive was covered on all sides with soot. But there was no obvious physical damage. The top cover didn't appear to be warped. The seals appeared to a casual glance to be intact. There didn't appear to be an circuit board components missing or visibly damaged, although the board may be slightly scorched.
Steve got a rag and started brushing the soot from the drive. He decided that it would work better with a little water. I later realized that applying any water was a really bad idea, because the soot becomes acidic when wet. But at the time I wasn't thinking about that. It only occurred to me that he shouldn't scrub the drive too much because it might damage seals that were otherwise intact. So he stopped doing that.
Steve thought maybe the drive didn't look so bad. He thought that there was a 50/50 chance that we'd be able to read it.
I unplugged the secondary ATA cable from the CD-RW and DVD-RW drives, and plugged in the burned drive. I hooked up a drive power cable. I told Steve that I'd cross my fingers except that I know it's bad luck to be superstitious. I turned on the system.
The drive made normal spinup and recal sounds. There weren't any screeching or thunking sounds that I would have expected if there was serious mechanical damage. Of course, the spindle bearings may have been damaged by the intense heat, so it's possible that it could have mechanical problems as time goes on. Thus it was important to try to retrieve the data as soon as possible. Even if the drive was initially fully working, we couldn't count on being able to read all the data even once, let alone have a second chance at it.
From a shell window, I verified that /proc/ide/hdc/model had the correct drive identification. So far so good. Next I tried "fdisk -l /dev/hdc" to list the parition table. No go, it reported "device not found". Hmmm... I tried to dd some blocks from the drive. "device not found" again. The drive didn't seem willing to cough up any data from the platters no matter what I tried, though it was still sitting there quietly spinning away.
I tried rebooting the machine. The BIOS recognized the drive and reported the correct model and capacity. Linux booted again, and the exact same behavior was seen. I was concerned. Was the drive electronics not working correctly? Was I going to have to send it to a data recovery service after all?
I piped dmesg into more to see what the kernel had reported during startup. There were some complaints from the ide-scsi driver. Aha! During the hasty Linux installation, Red Hat 9 had noticed the DVD-RW and CD-RW as the master and slave devices on the secondary ATA controller, and had conveniently put "hdc=ide-scsi hdd=ide-scsi" in the kernel command line in the GRUB configuration. This is very desirable for CD and DVD burners, but not at all useful for disk drives. I edited the grub.conf and rebooted.
Success! Now fdisk displayed the partition table, which appeared correct. I quickly created mount points for the partitions of the burned drive, mounted the partitions read-only, cd'd to the most important partition, and did a "tar -cvlf /old/homer2.tar". The file name is not a Simpsons reference, but rather a shorthand for the mount point, which was /home/ruckus2. /old is a 190 GiB partition on the new drive.
Old, familiar file names started scrolling down the screen. Things were looking up!
Steve pulled the four 256 MiB DIMMs and the CPU from the system. They're covered with soot but show no other signs of damage. Steve pointed out that they may well still work just fine, but since they've been stressed beyond the maximum rated operating conditions, it would be foolish to trust them. I might list them on eBay; I've never had a fire sale until now. Don't worry, I'm not going to misrepresent them. I doubt anyone will want to buy them, but on eBay you never know...
The most imporant three partitions stored around 4, 8, and 80 GiB. The first two didn't take too long, but I expected the third to take quite a while, and it did. In the mean time we ate dinner and played a game of Puerto Rico, in which I trounced Steve and his wife. I got 64 victory points, more than I've ever gotten before. If I hadn't made a mistake in the last round, I would have gotten another 7 VPs, but 64 was quite sufficient.
The operation completed with no errors. I moved on to the less important paritions, such as /usr and /var., and we watched a movie. After the movie, we returned to the garage, where the system had completed the last tar. I unmounted the partitions and shut the system down. I disconnected the old drive, reconnected the CD-RW and DVD-RW, closed up the case, and put the system back in my car.
Steve put the top case back on the old machine (sans screws), and we rebagged it. This time we taped the bags closed. I don't think there's any reason I shouldn't just throw it away, but I'm holding off on that for a few days.
Another friend pointed out to me afterward that the next time I start to complain about always having bad luck, I should look bad at this. On the other hand, if there really is anything to luck aside from sheer randomness, maybe today I used up my next five years' allotment of good luck. Who knows? I'm just glad to have gotten all the data safely extracted.
I don't know that Maxtor drives are particularly better engineered to withstand temperature extremes than any other brand of drive. But I'm impressed that it did so well. I've had good luck with Maxtor drives for many years, and only had to take advantage of their No Quibble Service(tm) once, back in 1995, for a drive in a very badly designed Compaq Pressario where the drive got absolutely *no* airflow. All of my own systems have been assembled with reasonable attention to keeping the drives cool, and I've never had a problem. Maxtor will continue to get my business for the forseeable future. I think I'll write them a letter and send them photos.
Speaking of which, I'm sorry that I can't make any photos available online at the moment. I'll try to do that in the near future. Right now I'm still trying to plan the next phase of the disaster recovery, which is getting a new colocation plan.
Needless to say, disaster recovery would be much easier, and much more dependable, if one had a plan in place ahead of time, and did routine backups. I should have known better.
Some time back, a friend asked me to provide backup DNS for his domains, including some commercial ones, and I was happy to oblige. But it turns out that he never put ns2.brouhaha.com into the registrar's records for the domain. So now those domains depend on his server and my primary, both of which burned up in the fire, but not on my backup which is still working.
I added another IP address to my backup DNS/MX server, changed all the named config files to be master rather than slave, and changed the IP address of ns.brouhaha.com to point to the new IP address. So now DNS works for all of my domains. But not for my friend. Although ns.brouhaha.com now responds and is authoritative for his domain, it was registered through Network Solutions, and since I don't have an account with them, there doesn't appear to be any way I can get them to change the IP address of ns.brouhaha.com in their database. My friend's domains will not resolve successfully until they do this. Sigh.
He's away on a business trip now, so it's hard for him to take care of this.
Lesson: make sure you have at least two authoritative name servers for your domain, at physically separate locations and on independent networks.
Mike has been able to recover data from the disks in one of the computers from the garage that was near floor level (lower than my machine), but the drive from the machine that was next to mine will not spin up. Fortunately the disks he did recover have backups of his most important data. Some of the backup data is months old, but for much of the data that's new enough.
Last updated October 29, 2003
Copyright 2003 Eric Smith