February 24, 2005 - 23:30 UTC
Update on yesterday’s outage: We are still dealing with some database fallout. Most of the classic SETI@home systems are up - enough that we can serve workunits to users. However, BOINC is dead in the water until we get at least one database server up and running.
With the master database corrupted beyond repair, we turned all our attention to the replica. Its disks finished sync’ing last night, and after some file system checks the machine booted and mysql started just fine. A battery of tests revealed no corruption… until we got to the result table. Of course, that’s by far the biggest and most important table in the database. We are attempting to repair it now.
Assuming we can repair it with little or no data loss, we will then dump all the data from the replica back onto the master. If we’re lucky, this will be done by tomorrow morning and we can start revving all the engines back up.
Please note that since it was a slower machine than the master, the data on the replica database server was about 30 minutes behind real time. We did try to limp both systems along to sync the replica data up even further but no dice. So, when we do get back on line it will be as if there was a half-hour hole in time during which all uploaded results were lost (and any user profile updates, message board postings, etc.). We sincerely apologize to all our users for this loss.
Court brought in a UPS from his personal server collection. So the master database will be protected while we scramble to purchase another. The database server was unprotected yesterday because it was in our lab, not in the data closet where all of our UPS’s are. We were/are just weeks away from a data closet reorganization designed to make room for the DB server.
Something is definitely very wrong, even classic is down.
Classic was also hit by this power outage. According to them, enough of classic is back up and running that they can serve work units out to people.
Here is the news from the classic site:
February 24, 2005
UPDATE (21:00 UTC): The database/data servers are mostly back on line and catching up from the long outage. Yesterday afternoon a breaker blew and power for the entire lab went down. We came back on-line three hours later, but due to servers shutting down ungracefully we had to check every table in every database. This process took all evening and continued well into the next morning.
Classic is still down. I can’t get anymore BOINC units, so I have been running Classic instead. Luckily I have 4 more units left, so hopefully it will be fixed by then.
I might have to take up Folding again (me waits for the :Pimp: s :eek: )
SETI is switched off and Folding is started again. Go Team 315!!! :yippee:
…eeerrrr!! Unfortunately both the Primary & Secondary servers cannot be connected to just at this moment
I must be jinxed!! :realmad:
/edit… Sorted. It’s the same problem I had before, it just does not like my firewall. I’ve allowed it access and I’ll wait & see what happens…
Got it sorted. Up to 6% on the first unit already. Think I’ll stick with this for a while and check its OK…
Ah Mr Droid…your usual table sir? I have the napkins FOLDED just the way you like sir
Well it looks as though Borkeley have finally got sorted! Managed to upload all my WU but their stock of new WU for download seems to have dropped down to 1K
All my machines reported in (finally) and I got new WUs, so I’m ready to roll. Was a close call there, only had a day’s worth left since this all started. I think I’ll refer to 5 day’s cache worth of WUs as the “berkeley buffer”.
All reported in and new WUs downloaded. Now running both BOINC and Folding! :chin: How long can it last?