D2OL (down indefinate)

Looked in the d2ol forums and found this upsetting little snippet for you :frowning:

All,

Amid the holidays I have been away too much - I am sorry.

The NAS has thrown some errors (again) and I believe the drives themselves have exceeded their natural lifespan. We’ve had failures coming out of boot, lost shares, dropped drives from the RAID, and routinely run out of memory. At this moment, the project is down and I am requesting hardware replacement from the vendor while also pricing out a new unit.

At the moment, I have no ETA for recovery, but obviously the NAS is a critical central piece of the project, handing out work units and receiving results. The absence of the NAS halts the project until it is replaced.


Charles Beckius
D20L Forum Administrator

That was posted on the 29th November and the project has been back up since then though intermittently… at the moment getting WU’s has not been much of a problem for me.

Curly

Well it is interesting they are still send ing units out.

Yeah… looking at one of the moderators posts more recently, a temp fix has been added so units can be sent out.
Unfortunatley they still need $8500 for a new drive tho (correct me if i’m wrong, i dont know much about hardware) to be able to get the completed units back.

From what I have read on the forums, it’s a low end Dell NAS using IDE drives in software RAID configuration that IMO is just asking for trouble - you should never use IDE drives in ‘mission critical’ situations, they just aren’t engineered to be sufficiently reliable in the first place.

Last I saw, the NAS was in pieces on one of the project Admins (Charles I think) desk :lol:

Personally, I’ll be mounting a stash because I have a goal to meet, but this has helped make my mind up about my future crunching…

this sucks. well keep us posted about the situation. is their much point in crunching units now if we cant upload them?

of course there is :smiley: just think of the stash you can dump when they finally sort it out :smiley: We can still d/load units in the meantime and just let the results build up for one hoooooge :bigdump:

Don’t forget about last time they came up, They dropped a bunch of work until they sorted it all out.

I am with Mulda on this one, if they are serious on this they should at least have it mirrored or raid 5 on redundant hardware. A regular raided NAS device is cheap enough these days even if they are running IDE devices, they should have some hot spares. What would have been considered a small problem is now a major issue. Even I have really important stuff mirrored on a hardware array.

Not good news but, not bad either :wink:

[b]posted 13 December, 2005 15:53

All,

The old NAS performs a chkdsk every time it boots, during which it finds no errors. This takes 4+ hours to validate the drives and is, how shall we say … inconvenient.

The Drives check out OK physically, but the RAID does not. When it completes boot, at least one drive is deactivated and the RAID is offline. I must manually activate the Drive, bring the RAID back online and then … wait … while it rebuilds the array.

Oh, and guess what. It’s a SOFTWARE RAID.

It has come to pass that there is irreparable data corruption in the operating system partition. With power failures and inadequate generators, when the NAS fails, this is bound to happen. With software RAID, this is a tragic thing.

The Current NAS has been offline for some time during this diagnosis and evaluation. I have secured a new NAS that has upgraded spindle speed, a hardware backplane with hardware RAID, and improved network connectivity. It is enroute.

The key here was to replace the device without the need to recode the software. Migrating the data will be dicey as the current NAS is not stable at all I have unplugged it from the network to prevent y’all from using it and uploading doomed data.

Yes the project is down. No, the project has not ended. It will simply take longer than we had hoped to march forward.


Charles Beckius
D20L Forum Administrator[/b]

Linky

Oh…

Nasty…

sounds bad…

Maybe a move to Rosetta while it’s down? :wink:

In Distrubuted Computing the project leaders are starting to learn that the key is communication. :thumbsup: to Charles for being honest over the state of the hardware. Patience is the key I think, the plans are in place for improved project stabilty and it will come in my opinion from the sounds of things. I remember not so long ago when Seti Boinc started a similiar thing happened with a two liner description “project is down” (or similiar).

I would personally like to see some reports on the work that has been done by the D2OL crunchers now, and then in my opinion that will show a project really embracing those who do a lot of the work :slight_smile: Maybe whilst the D2OL team wait for hardware we can expect to see some science updates ?

Overall though, :thumbsup: to D2OL.

DT.