When last we left the Water Cooling War, I was battling unreliability of my 800Mhz FSB machines. All along I thought it was the RAM overheating, although I was thinking the heat spreaders I added would do the trick. For one of the systems (the one from my last blog entry on water cooling) has been pretty darn stable. But the other machine (the triple screen beastie) was still getting BSODs every few days.
The interesting thing about these failures is that the hard drives would disappear afterward. It actually has two Samsung 160GB SATA drives in it, which I had planned to use as mirrored drives with the onboard RAID 1 Promise controller. After all, this is my main development machine, its worth while making sure the data doesn't go anywhere. But one of the drives had disappeared, I figured it had failed, so I was running the other drive solo.
And with the BSODs, both hard drives would be gone. I'd wait a half hour or so (I do have other machines to work from after all) and the drive would come back. I knew I'd have to replace the hard drives eventually, they have a three year warranty, but extricating water cooled hard drives isn't a lot of fun.
Well, it all came to a head last night when the drive just wouldn't come back. So I hauled the machine out and ripped both hard drives out... very carefully, so that I wouldn't cause any leaks.
As you can see in the photo, the hard drives have blocks on either side, connected by a plastic tube. This plastic tube is just press fit into place to handle the variations in width between hard drives, pull too hard and it would pop off and water will go everywhere.
I replaced both drives with identical models, put the whole thing back together and viola, two hard drives, mirrored and happy. I fired up Acronis to restore the image backup I have of my workstation (the easiest way to recover a system - you DO have a backup strategy, doncha?) and left the machine whirring away til morning.
When I returned in the morning, the machine had failed, both hard drives disappeared. Guess it wasn't the drives failing after all.
So, what could it be? The onboard controller? These ASUS P4C800-E motherboards have TWO different SATA controllers on them, could they both be bad? I think not. So what would knock out both drives?
Gosh, lookie there... the SATA power adapter plugs a 4-pin molex connector and provides connectors for BOTH SATA drives. Could there be a connection problem?
I replaced the adapter with a new one, and fired the system up - both drives recognized, no problem. I have to guess that once the system heated up, the connectors got loose and deprived the drives of power. This, naturally, would cause Windows to BSOD... and then there'd be no drives left.
So in a way, overheating is still the culprit, but I suspect that water cooling is not responsible for this. Of course, I won't know for sure til its been running for a few months.