Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 336 · 337 · 338 · 339 · 340 · Next
Author | Message |
---|---|
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 221 Credit: 7,504,240 RAC: 1,119 ![]() |
Is anyone else having trouble keeping their cache full? Yes, Even though I have 3 rosetta tasks in my cache, it is a low priority group: 0.25%. Much worse is WCG that has no tasks in its cache, even though its priority is 20.63% Right now, the only reliable group is Einstein, though it has he lowest priority of any: 0.13% Highest priority group is ClimatePrediction. No tasks for many months even though its priority is 78.47% ![]() |
![]() Send message Joined: 16 Jun 08 Posts: 1241 Credit: 14,421,737 RAC: 2 |
PrimeGrid has a VERY reliable selection of workunits. They even have automatic workunit generation to extend series of workunits with known patterns. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2371 Credit: 45,010,608 RAC: 25,097 ![]() |
Is anyone else having trouble keeping their cache full? Yes. My charmed life PC has only 3 spare tasks, another has 4, work PC no spare tasks after those it's running and my laptop has 3 of 8 threads idle atm I see you have 57 of 128 threads idle. Tough times Another appeal to those whose tasks only run 3 hours to change their Rosetta settings explicitly to 8hrs. Either do that or you may soon be running 0hrs on no tasks ![]() ![]() |
![]() Send message Joined: 28 Mar 20 Posts: 1851 Credit: 18,534,891 RAC: 0 |
Or the project could just fix the problems with the servers, but that would be way too much to hope for... Grant Darwin NT |
Stevie G Send message Joined: 15 Dec 18 Posts: 126 Credit: 1,028,210 RAC: 1,283 ![]() |
Sid: You wrote, "Because you quoted that section about removing the file extension, was that what finally solved your issues? It's the only thing that makes sense. In any case, we'll take it.." No, I did not change anything in my Notepad entry. I just saved it like you suggested. Thankyouverymuch. Then those Rosetta tasks appeared and ran to completion. Then I got some more, which finished Monday. But there are now six Rosetta tasks stuck in Ready to Report. Two of them have been there since Sunday and four all day Monday. Requesting updates provokes the message that "Communication is deferred" for two hours or more. Their deadline is the next day. What happens if they are still stuck there when the deadline expires? S. Gaber Oldsmar, FL |
Tom M Send message Joined: 20 Jun 17 Posts: 149 Credit: 30,994,747 RAC: 101,074 ![]() |
What happens if they are still stuck there when the deadline expires? The deadline only refers to the processing deadline. Ready to report is a post processing activity. The tasks will not "expire" or "get canceled". Proud member of the O.F.A. (Old Farts Association) |
Tom M Send message Joined: 20 Jun 17 Posts: 149 Credit: 30,994,747 RAC: 101,074 ![]() |
Is anyone else having trouble keeping their cache full? Would researching the top 50-100 and sending them private emails. Or listing them here for public notification do any good? Does anyone want to take the time? Proud member of the O.F.A. (Old Farts Association) |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2371 Credit: 45,010,608 RAC: 25,097 ![]() |
Is anyone else having trouble keeping their cache full? Today I see tasks came through early this morning for most of us. Tom, there was something I wanted to comment on previously. You wrote: And dropped the cpu tasks allowed for Rosetta (in the Pandora config file) to 200. This goes back to that whole discussion where Grant was saying you should hold a minimal cache of tasks, while I said that only applies if there's high confidence tasks will be available when you call for them. This is a good example of where we've all come unstuck by the lack of reliability of resupply, What seems to work is targeting having one backup task for each thread running Rosetta, plus a small margin for tasks 'cancelled by server' or computation errors (which are thankfully very low atm, as I mentioned last week) So, where you have 128 threads, essentially double it and add a few. Meaning 200 is just that bit too low and somewhere between 256 and 300 allows more time for tasks to start coming down again so you don't have idle threads. It won't cover every eventuality, but it doesn't surprise me you had no problems with a 300 cache, but ran short very soon after reducing your cache size to 200. ![]() ![]() |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2371 Credit: 45,010,608 RAC: 25,097 ![]() |
Sid: This is all very odd. It seems like your PC can't connect to the bakerlab servers again. I presume you've tried a reboot over the last couple of days? It might be worth a try. Someone who knows more about servers than me (almost everyone) might be able to suggest something. I'm useless at this kind of thing tbh. Sorry. If you solve it before the task deadlines, no problem. If you miss the deadline, Rosetta will reissue the tasks to someone else. If you subsequently do get them uploaded, you will get credit for them aiui. ![]() ![]() |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2371 Credit: 45,010,608 RAC: 25,097 ![]() |
Another appeal to those whose tasks only run 3 hours to change their Rosetta settings explicitly to 8hrs. God, no! I make the assumption, by their nature, the majority who have this issue don't read the forums and just let their tasks run however they're set up. I only comment for the few who skim the forums and may pay attention on the basis that every little bit helps ![]() ![]() |
Bill Swisher Send message Joined: 10 Jun 13 Posts: 66 Credit: 54,469,519 RAC: 142,447 ![]() |
Is anyone else having trouble keeping their cache full? Somewhat, I have one computer that was running a bit low on tasks. But it looks like it picked up about 115 Denis tasks overnight. Here's my current Rosetta work, spread out across 6 computers and 112 threads: State: All (6166) · In progress (441) · Validation pending (0) · Validation inconclusive (0) · Valid (5687) · Invalid (0) · Error (38) FWIW the workload distribution is: Denis 27%, Einstein 7%, Rosetta 60% and WCG 7%. All rounded UP to the nearest percent. ![]() |
![]() ![]() Send message Joined: 29 Nov 05 Posts: 14 Credit: 250,987 RAC: 139 |
I have 3 Rosetta tasks currently crunching and 31 Einstein tasks with 3 crunching as I type this. When my computer screen is shut off, I'm not sure how many Einstein tasks start crunching. No WCG tasks at the moment. WCG and Rosetta are set at 100 resource share and Einstein is set at 80 resource share. On June 1st and 2nd uploaded 7 valid tasks for Rosetta. I didn't have to edit my hosts file just disabled IPv6. My Rosetta settings are Target CPU run time 6 hours GenuineIntel Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz [Family 6 Model 158 Stepping 10] (12 processors). ![]() |
Tom M Send message Joined: 20 Jun 17 Posts: 149 Credit: 30,994,747 RAC: 101,074 ![]() |
I have 3 Rosetta tasks currently crunching and 31 Einstein tasks with 3 crunching as I type this. Hey Jon, Bet you could run them as 8 hour tasks just fine. Set it in the profile. What is your cache set at? 1 day? Get more set at? 0.1 ? I are curious... Thank you. Proud member of the O.F.A. (Old Farts Association) |
![]() Send message Joined: 28 Mar 20 Posts: 1851 Credit: 18,534,891 RAC: 0 |
Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz [Family 6 Model 158 Stepping 10] (12 processors).Unfortunately that is another seriously over committed system. The Target CPU time might be 6 hours, but it takes the system 12 hours to do that 6 hours worth of work. I would not make any changes to your cache or Target CPU time unless you sort out the system first. Taking 16 hours to do 8 hours worth of work will cause all sorts of issues. Run time 11 hours 50 min 56 sec CPU time 5 hours 58 min 44 sec Grant Darwin NT |
![]() Send message Joined: 12 Jan 08 Posts: 21 Credit: 195,801 RAC: 0 |
Wed 04 Jun 2025 10:18:42 AM EDT | Rosetta@home | Server error: feeder not running Greetings, I have been getting this error for several days from Rosetta. I have other Projects that work (When WU are available). What can you tell us about this error? I saw another posting that mentioned the error - but not the cause. Thanks in advance, Jay PS Here is some PC info: I am running Linux Mint, Wed 04 Jun 2025 10:16:38 AM EDT | | Starting BOINC client version 7.24.1 for x86_64-pc-linux-gnu Wed 04 Jun 2025 10:16:38 AM EDT | | log flags: file_xfer, sched_ops, task Wed 04 Jun 2025 10:16:38 AM EDT | | Libraries: libcurl/8.5.0 OpenSSL/3.0.13 zlib/1.3 brotli/1.1.0 zstd/1.5.5 libidn2/2.3.7 libpsl/0.21.2 (+libidn2/2.3.7) libssh/0.10.6/openssl/zlib nghttp2/1.59.0 librtmp/2.3 OpenLDAP/2.6.7 Wed 04 Jun 2025 10:16:38 AM EDT | | Data directory: /var/lib/boinc-client Wed 04 Jun 2025 10:16:39 AM EDT | | OpenCL: AMD/ATI GPU 0: VERDE (radeonsi, , LLVM 19.1.1, DRM 2.50, 6.8.0-60-generic) (driver version 24.2.8-1ubuntu1~24.04.1, device version OpenCL 1.1 Mesa 24.2.8-1ubuntu1~24.04.1, 2048MB, 2048MB available, 512 GFLOPS peak) Wed 04 Jun 2025 10:16:39 AM EDT | | libc: version 2.39 Wed 04 Jun 2025 10:16:39 AM EDT | | Host name: pc-14 Wed 04 Jun 2025 10:16:39 AM EDT | | Processor: 8 AuthenticAMD AMD FX(tm)-8150 Eight-Core Processor [Family 21 Model 1 Stepping 2] Wed 04 Jun 2025 10:16:39 AM EDT | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt fma4 nodeid_msr topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmcall arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold Wed 04 Jun 2025 10:16:39 AM EDT | | OS: Linux Linuxmint: Linux Mint 22.1 [6.8.0-60-generic|libc 2.39] Wed 04 Jun 2025 10:16:39 AM EDT | | Memory: 11.60 GB physical, 9.77 GB virtual Wed 04 Jun 2025 10:16:39 AM EDT | | Disk: 47.76 GB total, 44.03 GB free Wed 04 Jun 2025 10:16:39 AM EDT | | Local time is UTC -4 hours skipping... Wed 04 Jun 2025 10:16:39 AM EDT | | Checking active tasks Wed 04 Jun 2025 10:16:39 AM EDT | Asteroids@home | URL https://asteroidsathome.net/boinc/; Computer ID 806580; resource share 20 Wed 04 Jun 2025 10:16:39 AM EDT | Asteroids@home | Not using AMD/ATI GPU: project preferences Wed 04 Jun 2025 10:16:39 AM EDT | DENIS@home | URL https://denis.usj.es/denisathome/; Computer ID 257738; resource share 20 Wed 04 Jun 2025 10:16:39 AM EDT | Einstein@Home | URL https://einstein.phys.uwm.edu/; Computer ID 13208132; resource share 20 Wed 04 Jun 2025 10:16:39 AM EDT | LHC@home | URL https://lhcathome.cern.ch/lhcathome/; Computer ID 10875897; resource share 20 Wed 04 Jun 2025 10:16:39 AM EDT | Rosetta@home | URL https://boinc.bakerlab.org/rosetta/; Computer ID not assigned yet; resource share 100 Wed 04 Jun 2025 10:16:39 AM EDT | SiDock@home | URL https://www.sidock.si/sidock/; Computer ID 70356; resource share 20 Wed 04 Jun 2025 10:16:39 AM EDT | | Setting up GUI RPC socket Wed 04 Jun 2025 10:16:39 AM EDT | | Warning: GUI RPC password is empty. BOINC can be controlled by any user on this computer. See https://boinc.berkeley.edu/gui_rpc_passwd.php for more information. Wed 04 Jun 2025 10:16:39 AM EDT | | Checking presence of 42 project files Wed 04 Jun 2025 10:16:39 AM EDT | Rosetta@home | Sending scheduler request: To fetch work. Wed 04 Jun 2025 10:16:39 AM EDT | Rosetta@home | Requesting new tasks for CPU and AMD/ATI GPU Wed 04 Jun 2025 10:16:41 AM EDT | Rosetta@home | Scheduler request completed: got 0 new tasks Wed 04 Jun 2025 10:16:41 AM EDT | Rosetta@home | Server error: feeder not running Wed 04 Jun 2025 10:16:41 AM EDT | Rosetta@home | Project requested delay of 3600 seconds Wed 04 Jun 2025 10:18:31 AM EDT | Rosetta@home | Fetching scheduler list Wed 04 Jun 2025 10:18:35 AM EDT | Rosetta@home | Master file download succeeded Wed 04 Jun 2025 10:18:40 AM EDT | Rosetta@home | Sending scheduler request: To fetch work. Wed 04 Jun 2025 10:18:40 AM EDT | Rosetta@home | Requesting new tasks for CPU and AMD/ATI GPU Wed 04 Jun 2025 10:18:42 AM EDT | Rosetta@home | Scheduler request completed: got 0 new tasks Wed 04 Jun 2025 10:18:42 AM EDT | Rosetta@home | Server error: feeder not running Wed 04 Jun 2025 10:18:42 AM EDT | Rosetta@home | Project requested delay of 3600 seconds |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 422 Credit: 13,957,464 RAC: 3,095 ![]() |
From earlier in this thread :- https://boinc.bakerlab.org/rosetta/forum_thread.php?id=6893&postid=112638
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2371 Credit: 45,010,608 RAC: 25,097 ![]() |
Two more Validate errors tonight, meaning 2x12hr tasks not being awarded credit. Up to 22 now. So demoralising... ![]() ![]() |
Stevie G Send message Joined: 15 Dec 18 Posts: 126 Credit: 1,028,210 RAC: 1,283 ![]() |
Sid: Six Rosetta tasks still stuck in Ready to Report, four days now. . Update always results in " communication deferred." I've rebooted several times. Other projects are running fine. I'm thinking of re-setting the project. Maybe that will jar the system into action. S. Gaber |
William Albert Send message Joined: 22 Mar 20 Posts: 27 Credit: 2,109,090 RAC: 5,915 ![]() |
Two more Validate errors tonight, meaning 2x12hr tasks not being awarded credit. Your AMD Ryzen 7 5800X machine has so many error'd/invalid WUs that I would strongly suspect failing hardware. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2371 Credit: 45,010,608 RAC: 25,097 ![]() |
Two more Validate errors tonight, meaning 2x12hr tasks not being awarded credit. It is. It's a repeated disk failure that's kind of described here I don't trust myself fixing this, so I'm waiting for my hardware guy to get back. He's out of the country atm - and my fingers are permanently crossed that it doesn't become irretrievable before he returns. ![]() ![]() |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2025 University of Washington
https://www.bakerlab.org