SETI Server probs

Charlie1 · 13 June 2005 15:29

June 12, 2005
The web server failed this afternoon, but has since been rebooted and is slowly recovering now.

[As of 13 Jun 2005 14:20:07 UTC]

Database status
State Approximate #results
Ready to send 100,711
In progress 95,988
Waiting for validation 32
Transitioner backlog 32 hours

Everything is online except transitioners 1 & 2

Egad · 13 June 2005 15:49

Bad last week for Berkeley it would seem.

Hopefully , all will be sorted soon.

Charlie1 · 13 June 2005 16:13

6/13/2005 11:11:26 AM||request_reschedule_cpus: project op
6/13/2005 11:11:33 AM|SETI@home|Requesting 345600.00 seconds of work
6/13/2005 11:11:33 AM|SETI@home|Sending scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
6/13/2005 11:11:37 AM|SETI@home|Scheduler request to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi succeeded
6/13/2005 11:11:37 AM|SETI@home|General preferences have been updated
6/13/2005 11:11:37 AM||General prefs: from SETI@home (last modified 2005-06-12 04:29:33)
6/13/2005 11:11:37 AM||General prefs: using separate prefs for work
6/13/2005 11:11:37 AM|SETI@home|No work from project
6/13/2005 11:11:37 AM|SETI@home|Deferring communication with project for 1 minutes and 0 seconds

Times are EST-- seems the scheduler IS responding but swamped.

Database status
State Approximate #results
Ready to send 141,534
In progress 1,055,430
Waiting for validation 1,954
Transitioner backlog 31 hours

StormPCs · 13 June 2005 16:25

Most of my machines have work but they also have tons of WU’s to upload. Hope they get it fixed within a day or so because this not being able to upload really sucks. Also, I don’t think I have more than a day’s work on any system. :mad:

Come on Berzerkeley…give the beers and bongs a rest and fix some computers! The weekend is over!!!

Charlie1 · 13 June 2005 16:54

remember, 1 berkeley hour = 12 RW hours… so theyre probably still celebrating the most recent holiday…

The last outage caught me at the end of cache period and I lost 20 or so WU cause I couldnt report even after they had uploaded.

StormPCs · 13 June 2005 19:42

I know…I lost around 60. :mad:

Charlie1 · 13 June 2005 20:41

From Boinc SETI Technical News

June 13, 2005 - 19:00 UTC
There have been many failures over the past week. Another bug was found (and quickly patched) in our upload/download file server causing it to hang until reboot. Once that was remedied, both the scheduling server and the main web server had separate issues due to extremely high load.
In case you haven’t noticed, we recently changed the URL setiathome.ssl.berkeley.edu - instead of pointing to the old SETI@home “classic” project, it now leads users to the new BOINC-based version. As expected, this vastly increased the number of new users joining the BOINC project, and therefore increased the strain on our back-end servers. Soon we will stop new classic account sign-ups altogether, and eventually stop accepting classic results outright (with advance warning) - each step potentially increasing the demands on our hardware.

At this exact point there is no new hardware that BOINC could use as its various servers fail for one reason or another. This is because the classic project is still active and using up half of our server farm. This was soon change.

The classic “master science database server” (a 6 CPU Sun E3500) will be the first machine to be repurposed. We’re busy migrating most of its data onto a new database server (an 8 CPU E3500). This migration had been slowed by recent (recoverable) disk failures, but should finish in a month or so. Before then, however, we are going to move the BOINC scheduler onto it. The actual file upload/download handler will remain on its current server, thereby spreading the whole scheduler system over two machines.

As soon as possible, we will add a second webserver (and maybe a third). The BOINC web site contains far more dynamically-generated content than the classic site, and therefore needs more power behind it. We don’t really have any spares, so some machines will have to double as web servers and whatever else they are currently doing.

And as if that wasn’t enough to worry about, the BOINC replica database has continually fallen further and further behind the master database (because the load on the master increases and the replica hardware is relatively inferior). Then yesterday it was rendered useless as a binary log on the master got corrupted. This didn’t damage the master database - only the replica. So we’re going to have to build the replica from scratch (or hold off until we somehow obtain between hardware for that).

this isn’t what I wanted to hear… many holes in the road ahead I’m afraid

wheelieslug · 13 June 2005 21:12

I’m able to upload most units…eventualy.

So, I suspect, that means I’ll have to wait a bit for project updates… (sigh…)

Good luck guys. Patience is a vertue. And a very nice girl…

Charlie1 · 13 June 2005 21:44

I’ve been able to contact the server a few times recently but no work from project…

Who needs patience… just give me a nice girl

Sir_Ulli · 13 June 2005 22:35

btw

i have no Probs with a 3 Day Cache, here is my Athlon XP 2.500+ at this time, 19 WUs are ready to crunch

Picture about 150 KB

only for Info

Sir Ulli

Charlie1 · 13 June 2005 22:58

I normally dont have any problems except for the machine I just upgraded. It is set to three days cache (actually five after yesterday) but only gives me enough to last a day or so. For some reason it is still tied to the old computer and hasnt caught up with the new system yet. But it’s only been a week so It may yet do so.