Jayne Troubles

2009.01.27 in personal

The plan was for Jayne (the machine that serves this blog, the rest of my web stuff, Robb's blog, lots of code, backups, etc.) to be a reasonably stable and very large (storage-wise) machine. The storage part worked, more or less, but the stability is ehh-not-so-much. I have a very consistent problem with libata, but it's not clear why...



Anyway. Here's the facts:

  • I'm using Debian unstable, with Linux 2.6.28.
  • I've got a Shuttle case, and a 100 W PSU (!!).
  • I get the libata errors above, but only after the machine is up for two or so weeks.
  • I didn't get the error at all during a one-month-long winter break, but it happened once again, two weeks after coming back to RPI.
  • The problem has never happened when the case is open and a large house fan is on top of the case, which was true for a significant part of last semester.
  • After the problem occurs, the machine must be left off for a few hours before it will successfully restart.
  • The disks never report getting anywhere over about 44 degrees C.
  • Only the root partition ever acquires any errors. Not the big (1.2TB) main partition.
  • The total potential power use of the system is well over 100 W, and Shuttle recommended against putting both a Core 2 and two disks in this particular combination of case/mobo/PSU.
Any ideas?

I've thought about it a lot, and cannot seem to come up with a reasonable explanation. If it's power, software, or the cables, the cooldown time doesn't make any sense; if it's the disks, running for a long time doesn't make any sense, and I've spent a lot of time beating on the disks to try to reproduce it and cannot; if it's heat (which seems most likely), the fact that it's 20 degrees C under the acceptable working temperature of the disks makes that make very little sense (and I never get SMART warnings or anything, just libata transmission errors!)...

If not, I'm pretty sure a good bit of RCOS money this semester is going to head towards a new case/mobo/PSU (keeping the CPU/disks/memory/etc. from Jayne). Unfortunately, that will end up being a much bigger case... I'd also, in that case, have a really adorable little case that would almost definitely work fine with a single disk/an Atom/etc... anyone, anyone?