How long you can safely keep a work unit before returning it to Berkeley

richardlellisjr · 8 September 2003 18:07

This question came up so I inquired on the Berkeley board here.

I was directed to a link on the Berkeley FAQ which answers the question about the time work units can be kept before returning. I understand why there is some confusion as it isn’t written very clearly at all. It seems there is no fixed time limit. The situation is duplicate work units are issued for error checking. The longer you keep a work unit, the greater the chance an error checked result will already have been sent back to Berkeley.

In short, you shouldn’t worry about it. Duplicate work units are deliberate and part of the process. One month is acceptable to Berkeley but two days is better from their standpoint.

Here is the link.

Here is the direct quote:

What is an “optimal” cache size to avoid returning obsolete results?
Caching is now more popular than ever. (See our add-ons page for some of the more popular caching programs.) In general, a 1-month cache isn’t bad in terms of redundant results, but a 7-day cache is much better. An “optimal” cache (minimizing the likelihood that a result will have already passed integrity testing before you return it), however, shouldn’t hold more than a 2 days of workunits. In the past, 1 month was pretty safe, but the ramifications of Moore’s Law have made the cache window much smaller. Keep in mind, of course, that excepting the above case, redundancy is extremely important for testing the integrity of our data. Interestingly, even without cacheing, users on average will receive a duplicate workunit (one they’ve processed before) about once every 500 times.

Sir_Ulli · 9 September 2003 00:59

7 Days i thought at Seti@home Classic, at BOINC you can see the Deadline.

just my personell thoughts.

and at BOINC, they have learned a lot, you can see the Deadline directly.

regards
Sir Ulli

richardlellisjr · 9 September 2003 01:32

There is no time limit. Multiple duplicates are send out on the basis of a statistical formula. If you send a woot back in an hour there is a tiny risk of it being the third one in. A week, the risk is higher. Just processing and sending the woots back immediately is no guarantee yours won’t be a triplicate.

As was said, keeping work units a month will still provide the science Berkeley needs. But, if you keep your woots a shorter time, they may, be of greater value. Nobody can tell for sure what is too long.

Sir_Ulli · 9 September 2003 01:50

no timelimit, and

Multiple duplicates are send out

so look here

i bet a Crate Whiskey that many of the are not at BOINC, when the fun beginns.

Sir Ulli

edit

@richardlellisjr

the PM was only for Info, …

No Flaming ore something else

richardlellisjr · 9 September 2003 13:33

Originally posted by Sir Ulli
@richardlellisjr

the PM was only for Info, …

No Flaming ore something else

I would never accuse you of Flaming me or anyone else Sir Ulli. :nod:

richardlellisjr · 17 September 2003 00:24

The person who wrote the basis for Sir Ulli’s postings on this issue has responded in the Berkeley board thread pointed to above. I believe this should clear up the issue on how long you should keep work units. I present the full reply below:

Hm, the Text found in the first Posting was written by myself, so I shall try
to clear up this “Rumor” :

The original Text about the effect of Cache sizes dates back many months now,
located by an Administration Posting right in this Message Board.

Due to numous failures of this MessageBoard, all this got lost unfortunately.

All I can say is, that (at that time), these 3, significantly different effects
of Cache-Sizes were the content of the official Posting by the Administration.

Quick Recap of what is now percieved as “Rumors” :

Ideal : 24-48hrs Cache -

Why ?
S@A Administration wrote, that most Results [at the time of writing] return
back to the Server in this , thus giving the WorkUnit the Tag “Verified” with
sufficient Results to confirm its correctness.
[To my best understanding] This closes the cylcle between initial distribution
of a Workunit, and its Result becoming “Verfied” beyond doubt.
[Again, to my best understanding] Further Results do not change or affect the
Status “Verified” in the Database.

So, in order to submit any Result right into the Database, it is logical that
this should happen within 24-48hrs, ideal Case.
Everything beyond that runs an increasing [with time] risk of not fulfilling
its prime scientific purpose : Process and Confirm the Results of any given
WorkUnit.
That’s why I labeled this timeframe ideal, for all those seeking to participate
primarily for the science in it…

Less than Optimal : upto ~7 days Cache -

Why ?
S@H Administration wrote, that these Results [the 7 days were explicitly
mentioned] are increasingly used only for “Database Integrity Checks” [at the
time or writing], since the Result already was verified by others.
However, some Results never return back to the server (for a multitude of
reasons), thus still giving a “late Result” a certain chance to make it into
the Database.

In this case, the prime scientific goal [process and confirm Results] runs a
significantly increasing risk of getting simply missed, since others already
gave a Result the Tag “Verfied” before. The cross-checking of Database appears
primary a function that assists in early detection of Database errors, being
rather technical of nature and not directly connected to the science itself
anymore.

This was the key to my assessment, that exceeding the ~24-48hrs timeframe
becomes increasingly undesireable (if you’re into it for the science).

Have a look at the Listed Candidate Signals Table :
You’ll find only max. 3 Users listed besided each, despite certainly more than
those actually submitted their Results for the given Candidate Signals. This
makes me conclude that the “4th place” is not the place to be.

2 identical Results could be sufficient (min. required) for confirmation, 3
identical Results are [as far as I remember] Ideal and desired to confirm
beyond doubt.

Undesireable upto useless : upto 4 weeks of Cache -

Why?
S@H Administration wrote, that these extremely large Caches were considered
useful in the past, when the Server frequently failed or was dropping lots of
connections, making Downloading either impossible or extremely painful (slow).
But since those times [those that remember them, sure know how difficult it
could get in the past] are over, however, and CPU Performance steadily
increased, those large Caches lost their purpose.

The Results of those long timeframes are prone to be useless, for the Stats
Counter only, and serve no other purpose of scientific (or other) value.

This is more than logical, since almost all Results get their “Verified” Tag
long before such a cached Result finally makes it back to the server.

The chance of being useful for ‘anything’ simply decreases drastically with
time, thus it becomes highly undesirable, wasted processor time and energy.
(unless you are the type that gets the kick out of the Stats alone, and don’t
care about other issues)

===================
As always, I hope this clears up those “Rumors”, which actually are none.
This is how I recall it; it may contain small errors of interpretation, but the
bottom line was definitely as I recaptured it.

Since all this is many months ago already, things will have further developed
towards even faster Return times (due to faster Computers available).

The 28 days Timeframe was solely constructed to accomodate Users (relatively
few even at that time guess), which computing times took a drastic hit when
moving to the V3.03 Client Architecture on February 6, 2001.
It plays no real role in Workunit return times nowadays, since times have
changed since then, merely keeps someone “Active” for the Global Stats even
after a few weeks of vacation, and keeps the Active UserCount high
(an important figure when it comes to talking with potential Sponsors for the
project!)…

Greetings
FalconFly

How long you can safely keep a work unit before returning it to Berkeley

The Results of those long timeframes are prone to be useless, for the Stats Counter only, and serve no other purpose of scientific (or other) value.

The Results of those long timeframes are prone to be useless, for the Stats
Counter only, and serve no other purpose of scientific (or other) value.