Task ran fine, but timed out on Threadripper 9000 after approx. 2,5 days: rb_04_04_702801_692873_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_09_10_3023291_43_0

Message boards : Number crunching : Task ran fine, but timed out on Threadripper 9000 after approx. 2,5 days: rb_04_04_702801_692873_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_09_10_3023291_43_0

To post messages, you must log in.

AuthorMessage
Profile Michael H.W. Weber
Avatar

Send message
Joined: 18 Sep 05
Posts: 17
Credit: 6,850,436
RAC: 8,026
Message 113526 - Posted: 8 Apr 2026, 16:33:29 UTC

So, I got a task which ran unusually long (I did not set ANY preference regarding desired run time) and was timed out after the return deadline was reached. Note that this machine does not bunker tasks, so it started right after download and did not finish within time.

This is the work packet:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1447140176

This is the task (no log):
https://boinc.bakerlab.org/rosetta/result.php?resultid=1626758790

WU name is:
rb_04_04_702801_692873_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_09_10_3023291_43_0

I checked the screensaver after around 2 days, 14 hrs and it looked just fine.
Apparently, this WU type does not "converge" (if this applys here) and some scientist might want to take a closer look at this packet.

Michael.
President of Rechenkraft.net e.V.

http://www.rechenkraft.net - The world's first and largest distributed computing association. We make those things possible that supercomputers don't.
ID: 113526 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2563
Credit: 47,206,982
RAC: 3,992
Message 113527 - Posted: 8 Apr 2026, 19:43:45 UTC - in response to Message 113526.  

So, I got a task which ran unusually long (I did not set ANY preference regarding desired run time) and was timed out after the return deadline was reached. Note that this machine does not bunker tasks, so it started right after download and did not finish within time.

This is the work packet:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1447140176

This is the task (no log):
https://boinc.bakerlab.org/rosetta/result.php?resultid=1626758790

WU name is:
rb_04_04_702801_692873_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_09_10_3023291_43_0

I checked the screensaver after around 2 days, 14 hrs and it looked just fine.
Apparently, this WU type does not "converge" (if this applys here) and some scientist might want to take a closer look at this packet.

Michael.

I have no idea, but checking around it does seem like this task was a one-off among a lot of others run by that User that ran just fine.
The other thing I notice is that User is using Boinc Manager 7.24.1 when the current version is 8.2.9

It also seems strange that other tasks returned by that user ran for 8hrs with very little time between CPU time and Wall-clock time, doesn't appear to run other projects, yet this one task (and no others) missed deadline

As a one-off I wouldn't worry about it, although I am seeing some of my own tasks crashing out before completion too
It seems we're running a slightly faulty batch atm with errors not uncommon
ID: 113527 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Task ran fine, but timed out on Threadripper 9000 after approx. 2,5 days: rb_04_04_702801_692873_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_09_10_3023291_43_0



©2026 University of Washington
https://www.bakerlab.org