Lost a drive that had data that I did not have a full backup. It sucks but the data was not critical.
There were couple lessons learned, first off the root cause, I think a fan died, I have not opened it up yet, but a fan heated up the drives causing them do fail.
So I have figured out how to monitor the drive temps. I am using a utility called hddtemp. It produces nice simple info on drive temps, but in addition it has a daemon which a nagios plugin can communicate with to produce an alert if temps go to high.
Furthermore, I set up smartd to let me know if a drive is failing as well.
Another thing I wanted to know was what were on those drives exactly, so I used find to walk through those drives and produce info using file and stat.
I did replace the drive and I am watching temps. I have a large fan cooling the system down for now.
The temp at 9:57PM was 59C for one drive after the fan was on it for about 15 minutes the temp dropped to 42C. I have it checking if temps go above 35 to warn and 48 for critical.
Lets see if temps stay down.
Weight: 314,4