Task ran fine, but timed out on Threadripper 9000 after approx. 2,5 days: rb_04_04_702801_692873_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_09_10_3023291_43_0

Message boards : Number crunching : Task ran fine, but timed out on Threadripper 9000 after approx. 2,5 days: rb_04_04_702801_692873_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_09_10_3023291_43_0

To post messages, you must log in.

AuthorMessage
Profile Michael H.W. Weber
Avatar

Send message
Joined: 18 Sep 05
Posts: 18
Credit: 6,850,436
RAC: 5,685
Message 113526 - Posted: 8 Apr 2026, 16:33:29 UTC

So, I got a task which ran unusually long (I did not set ANY preference regarding desired run time) and was timed out after the return deadline was reached. Note that this machine does not bunker tasks, so it started right after download and did not finish within time.

This is the work packet:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1447140176

This is the task (no log):
https://boinc.bakerlab.org/rosetta/result.php?resultid=1626758790

WU name is:
rb_04_04_702801_692873_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_09_10_3023291_43_0

I checked the screensaver after around 2 days, 14 hrs and it looked just fine.
Apparently, this WU type does not "converge" (if this applys here) and some scientist might want to take a closer look at this packet.

Michael.
President of Rechenkraft.net e.V.

http://www.rechenkraft.net - The world's first and largest distributed computing association. We make those things possible that supercomputers don't.
ID: 113526 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2574
Credit: 47,207,888
RAC: 2,523
Message 113527 - Posted: 8 Apr 2026, 19:43:45 UTC - in response to Message 113526.  

So, I got a task which ran unusually long (I did not set ANY preference regarding desired run time) and was timed out after the return deadline was reached. Note that this machine does not bunker tasks, so it started right after download and did not finish within time.

This is the work packet:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1447140176

This is the task (no log):
https://boinc.bakerlab.org/rosetta/result.php?resultid=1626758790

WU name is:
rb_04_04_702801_692873_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_09_10_3023291_43_0

I checked the screensaver after around 2 days, 14 hrs and it looked just fine.
Apparently, this WU type does not "converge" (if this applys here) and some scientist might want to take a closer look at this packet.

Michael.

I have no idea, but checking around it does seem like this task was a one-off among a lot of others run by that User that ran just fine.
The other thing I notice is that User is using Boinc Manager 7.24.1 when the current version is 8.2.9

It also seems strange that other tasks returned by that user ran for 8hrs with very little time between CPU time and Wall-clock time, doesn't appear to run other projects, yet this one task (and no others) missed deadline

As a one-off I wouldn't worry about it, although I am seeing some of my own tasks crashing out before completion too
It seems we're running a slightly faulty batch atm with errors not uncommon
ID: 113527 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Michael H.W. Weber
Avatar

Send message
Joined: 18 Sep 05
Posts: 18
Credit: 6,850,436
RAC: 5,685
Message 113541 - Posted: 14 Apr 2026, 15:21:56 UTC - in response to Message 113527.  


The other thing I notice is that User is using Boinc Manager 7.24.1 when the current version is 8.2.9

I run that older BOINC client (the latest v7 version) on purpose because I figured that the entire v8 BOINC client batch has severe scheduling issues - especially in conjunction with using BAM! and communicating with Primegrid (over BAM!).
Just check out other BOINC forums where people complain about not receiving tasks - it always seems connected to BOINC v8 clients.

Michael.
President of Rechenkraft.net e.V.

http://www.rechenkraft.net - The world's first and largest distributed computing association. We make those things possible that supercomputers don't.
ID: 113541 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2574
Credit: 47,207,888
RAC: 2,523
Message 113544 - Posted: 14 Apr 2026, 23:34:06 UTC - in response to Message 113541.  

The other thing I notice is that User is using Boinc Manager 7.24.1 when the current version is 8.2.9

I run that older BOINC client (the latest v7 version) on purpose because I figured that the entire v8 BOINC client batch has severe scheduling issues - especially in conjunction with using BAM! and communicating with Primegrid (over BAM!).
Just check out other BOINC forums where people complain about not receiving tasks - it always seems connected to BOINC v8 clients.

Michael

I haven't heard or seen that with any version of v8
Scheduling has certainly changed, but I personally find it better if I notice any difference at all
That said, I signed up to BAM a very long time ago and never found out how to make it work, so I abandoned it many years ago
I'm much more inclined to blame BAM than Boinc, but my requirements are very limited, so you may have different needs to me
ID: 113544 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1931
Credit: 18,534,891
RAC: 0
Message 113546 - Posted: 15 Apr 2026, 6:51:34 UTC - in response to Message 113544.  

I'm much more inclined to blame BAM than Boinc,
Yep- there are issues with BAM that have never been addressed.
Grant
Darwin NT
ID: 113546 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Task ran fine, but timed out on Threadripper 9000 after approx. 2,5 days: rb_04_04_702801_692873_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_09_10_3023291_43_0



©2026 University of Washington
https://www.bakerlab.org