Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 338 · 339 · 340 · 341

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2380
Credit: 45,200,983
RAC: 24,025
Message 112788 - Posted: 11 Jun 2025, 23:18:42 UTC - in response to Message 112787.  

First tasks came down and everything crashed out within seconds - WCG tasks running fine though.
My latest idea I'm guessing is that it's something to do with Norton denying access to some directories, so I've just whitelisted the Boinc program folder and the data folder.
Awaiting another download to see if it worked.

Yup, that made no difference whatsoever... <sigh>
ID: 112788 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 158
Credit: 31,910,146
RAC: 104,988
Message 112789 - Posted: 12 Jun 2025, 1:28:12 UTC - in response to Message 112788.  

First tasks came down and everything crashed out within seconds - WCG tasks running fine though.
My latest idea I'm guessing is that it's something to do with Norton denying access to some directories, so I've just whitelisted the Boinc program folder and the data folder.
Awaiting another download to see if it worked.

Yup, that made no difference whatsoever... <sigh>


If I am clear, you have the Windows on a seperate HDD. And you just replaced the Data HDD where the Boinc is living?

Have you checked the permissions on your Rosetta exe? Or better yet just "reset" the project and it should download everything clean and start running again.

I hope.
Proud member of the O.F.A. (Old Farts Association)
ID: 112789 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 158
Credit: 31,910,146
RAC: 104,988
Message 112790 - Posted: 12 Jun 2025, 1:31:47 UTC

There are several systems that normally have higher RAC's than I do. Yet many of them don't have a full enough cache to run all of the available threads.

Since I know we have published both Linux and Window's polling scripts. They should be able to suck down enough to keep up?

Does anyone reading here run those systems? Tell me/us what is going on?

Respectfully,
Proud member of the O.F.A. (Old Farts Association)
ID: 112790 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1854
Credit: 18,534,891
RAC: 0
Message 112791 - Posted: 12 Jun 2025, 5:02:11 UTC - in response to Message 112789.  

Or better yet just "reset" the project and it should download everything clean and start running again.
Yep.
Reset the Project, and let it re-download all new files (given your disk issues, the existing files could very well be corrupted).
Grant
Darwin NT
ID: 112791 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stevie G

Send message
Joined: 15 Dec 18
Posts: 129
Credit: 1,028,210
RAC: 477
Message 112792 - Posted: 12 Jun 2025, 6:45:33 UTC - in response to Message 112763.  
Last modified: 12 Jun 2025, 6:47:07 UTC

But there are now six Rosetta tasks stuck in Ready to Report. Two of them have been there since Sunday and four all day Monday.

Requesting updates provokes the message that "Communication is deferred" for two hours or more. Their deadline is the next day.

This is all very odd. It seems like your PC can't connect to the bakerlab servers again.
I presume you've tried a reboot over the last couple of days? It might be worth a try.
Someone who knows more about servers than me (almost everyone) might be able to suggest something. I'm useless at this kind of thing tbh. Sorry.

Six Rosetta tasks still stuck in Ready to Report, four days now... Update always results in "communication deferred"

I've rebooted several times. Other projects are running fine.

I'm thinking of re-setting the project. Maybe that will jar the system into action.

I'm not inclined to think a project reset is going to make the bakerlab server any more reachable.
Can you confirm the few lines in your event log that result in communication being deferred? I'm assuming the server isn't reachable, but just to be sure.
And re-check your hosts file (without any extension - make doubly sure) is still as it should be.

I just don't get how you were able to contact the server to grab and return a few dozen tasks, then it becomes unreachable without something changing in between.
Anyone else with any ideas, pipe up.


I reset the project,, rebooted, hit update several times per day and still get no tasks.

You remember the six completed task I had that were ready to send for several days? My account now reports them all "Timed out-- no response."

So I don't understand what is wrong.

S. Gaber
ID: 112792 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2380
Credit: 45,200,983
RAC: 24,025
Message 112793 - Posted: 12 Jun 2025, 10:43:48 UTC - in response to Message 112789.  

First tasks came down and everything crashed out within seconds - WCG tasks running fine though.
My latest idea I'm guessing is that it's something to do with Norton denying access to some directories, so I've just whitelisted the Boinc program folder and the data folder.
Awaiting another download to see if it worked.

Yup, that made no difference whatsoever... <sigh>

If I am clear, you have the Windows on a separate HDD. And you just replaced the Data HDD where the Boinc is living?

Have you checked the permissions on your Rosetta exe? Or better yet just "reset" the project and it should download everything clean and start running again.

I hope.

With Grant confirming your suggestion I've done this straight away.
Before sending the PC away I'd run all tasks down to nothing and set No New Tasks on all projects. I didn't want anything to start working again before I was ready.
The old data HDD was drive E and came back as drive D, so I had to change that back to E in disk management
And it came back with User directories set to C so I re-set symbolic links of docs, downloads, music, pictures, videos etc back to E as well

I was very lucky my repairer sent me pictures of my directories E:/ E:/Users & E:/Users/<Name> to help ensure I wasn't forgetting anything
I didn't ask for those, but he was double-checking what to file-copy after he discovered cloning was failing due to repeated crashes (the reason I needed the new drive) and it turned out to be a Godsend - very fortunate.
Yes, I have had further problems with permissions, which I <think> I've now resolved, but I may've just resolved the most obvious ones and there's others still lurking in the background.
I've no obvious way of knowing without a dialog box popping up - I'm not technical enough to know how to find out.

Moving on, resetting Rosetta was producing no reaction in my Event log for several minutes, so I took the opportunity to review the detailed Security history in Norton.
There are lots of blocked transactions, the source of which is almost completely opaque.
Specifically: "Rule IGMP Public Blocked IGMP(2) traffic with (192.168.0.1)"
Over the years, Norton has hidden more and more under the bonnet, to the point where finding out what it's doing and why is increasingly hidden away.
I discovered this Rule IGMP is one of its default Traffic rules (and isn't 192.168.0.1 my own router?). I took the view it wasn't wise to change the rule in any way.
Trawling through other detailed settings I discovered Boinc blocked in its Sandbox section. Why or how, I don't know. I changed that to allow it.
I also ensured I'd properly whitelisted the whole C Boinc directory and E Boinc data directory.

Going back to Event log, after 30mins of nothing, a new Master file download succeeded. I suspect following my removal of Boinc from Norton's Sandbox block list
After 90 more minutes of attempts to download Rosetta tasks, finally I got some.
And they're running without crashing out. Success!

I should point out, throughout this period, I've been receiving and successfully running WCG tasks to completion, so my PC hasn't been idle.
Why, I don't know.
Whatever problems I've since discovered and resolved in Norton should've affected WCG tasks just as much as Rosetta. But clearly they didn't.
It''s a mystery I'm not going to get bogged down in. Rosetta and WCG are both running succesfully and i can depart for work for 3 days without having to worry about it.
I've now set WCG back to NNT to get Rosetta back in full flow.

Thanks for letting me bounce ideas off you guys. It genuinely did help. I'd got myself bogged down without the suggestion of a new route around the problem.
ID: 112793 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 158
Credit: 31,910,146
RAC: 104,988
Message 112794 - Posted: 12 Jun 2025, 12:46:56 UTC - in response to Message 112792.  



So I don't understand what is wrong.

S. Gaber



Me neither.

Does your hosts file look something like this?

127.0.0.1 localhost
127.0.1.1 Lynnes-Monolith
128.95.160.156 boinc-files.bakerlab.org
128.95.160.156 bwsrv1.bakerlab.org


# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
Proud member of the O.F.A. (Old Farts Association)
ID: 112794 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stevie G

Send message
Joined: 15 Dec 18
Posts: 129
Credit: 1,028,210
RAC: 477
Message 112795 - Posted: 12 Jun 2025, 16:58:51 UTC - in response to Message 112794.  



So I don't understand what is wrong.

S. Gaber



Me neither.

Does your hosts file look something like this?

127.0.0.1 localhost
127.0.1.1 Lynnes-Monolith
128.95.160.156 boinc-files.bakerlab.org
128.95.160.156 bwsrv1.bakerlab.org


# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters


I copied the host file you suggested and reset the project again.

Here's my event log:
6/12/2025 12:55:10 PM | Rosetta@home | Resetting project
6/12/2025 12:55:15 PM | Rosetta@home | Master file download succeeded
6/12/2025 12:55:20 PM | Rosetta@home | Sending scheduler request: To fetch work.
6/12/2025 12:55:20 PM | Rosetta@home | Requesting new tasks for CPU and AMD/ATI GPU
6/12/2025 12:55:21 PM | Rosetta@home | Scheduler request completed: got 0 new tasks
6/12/2025 12:55:21 PM | Rosetta@home | Server error: feeder not running
6/12/2025 12:55:21 PM | Rosetta@home | Project requested delay of 3600 seconds
ID: 112795 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1242
Credit: 14,421,737
RAC: 1
Message 112796 - Posted: 12 Jun 2025, 20:47:30 UTC - in response to Message 112795.  

This line usually indicates that the server you are trying to download work from is not running, so all you can do is wait for it to start running again:

6/12/2025 12:55:21 PM | Rosetta@home | Server error: feeder not running
ID: 112796 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 158
Credit: 31,910,146
RAC: 104,988
Message 112797 - Posted: 13 Jun 2025, 1:52:53 UTC - in response to Message 112795.  
Last modified: 13 Jun 2025, 1:57:00 UTC



So I don't understand what is wrong.

S. Gaber



Me neither.

Does your hosts file look something like this?

127.0.0.1 localhost
127.0.1.1 Lynnes-Monolith
128.95.160.156 boinc-files.bakerlab.org
128.95.160.156 bwsrv1.bakerlab.org


# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters


I copied the host file you suggested and reset the project again.

Here's my event log:
6/12/2025 12:55:10 PM | Rosetta@home | Resetting project
6/12/2025 12:55:15 PM | Rosetta@home | Master file download succeeded
6/12/2025 12:55:20 PM | Rosetta@home | Sending scheduler request: To fetch work.
6/12/2025 12:55:20 PM | Rosetta@home | Requesting new tasks for CPU and AMD/ATI GPU
6/12/2025 12:55:21 PM | Rosetta@home | Scheduler request completed: got 0 new tasks
6/12/2025 12:55:21 PM | Rosetta@home | Server error: feeder not running
6/12/2025 12:55:21 PM | Rosetta@home | Project requested delay of 3600 seconds


Now start this script from a command line window:

Windows script to keep running updates on Rosetta at Home.
From https://boinc.bakerlab.org/rosetta/show_user.php?userid=412375 aka: kotenok2000

cd /d c:Program FilesBOINC
:loop
boinccmd.exe --project https://boinc.bakerlab.org/rosetta/ update

TIMEOUT /T 600 
goto loop



I have had trouble with missing back slashes when trying to post this. There is a back slash between the c: and the "Program Files". And another between "Program Files" and BOINC.

And if your Boinc lives someplace else you need to change the drive letter and path to suit.

The reason you run this script is to more reliably get downloads from Rosetta.
Proud member of the O.F.A. (Old Farts Association)
ID: 112797 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2380
Credit: 45,200,983
RAC: 24,025
Message 112798 - Posted: 13 Jun 2025, 2:02:07 UTC - in response to Message 112792.  

But there are now six Rosetta tasks stuck in Ready to Report. Two of them have been there since Sunday and four all day Monday.

This is all very odd. It seems like your PC can't connect to the bakerlab servers again.

Six Rosetta tasks still stuck in Ready to Report, four days now... Update always results in "communication deferred"
I'm thinking of re-setting the project. Maybe that will jar the system into action.

I'm not inclined to think a project reset is going to make the bakerlab server any more reachable.
Can you confirm the few lines in your event log that result in communication being deferred? I'm assuming the server isn't reachable, but just to be sure.
And re-check your hosts file (without any extension - make doubly sure) is still as it should be.

I just don't get how you were able to contact the server to grab and return a few dozen tasks, then it becomes unreachable without something changing in between.
Anyone else with any ideas, pipe up.

I reset the project, rebooted, hit update several times per day and still get no tasks.

You remember the six completed task I had that were ready to send for several days? My account now reports them all "Timed out-- no response."

So I don't understand what is wrong.

I hear everything you're saying, but C:/Windows/System32/drivers/etc/hosts is not being read
That's 'hosts' with no extension - not .bak .old .txt .doc or anything else, just hosts
And not in any other directory - specifically the folder written above

For whatever reason that none of us seems to understand, Rosetta won't magically come back just by waiting
Something changed somewhere. You <must> have had it right to get tasks to come down, then it <must> have changed to stop connecting
And whatever the file is that you're editing now simply cannot be the one in that very specific folder

I know I'm writing this from a distance like I know better than you, but if you've repeatedly put the lines you've been given in the right file in the right place you simply wouldn't keep on receiving the message "Server error: feeder not running".
You might get other messages saying all sorts of things, but not that.
That's a line that says I'm not looking at the hosts file you keep editing.
ID: 112798 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Adam Gajdacs (Mr. Fusion)

Send message
Joined: 26 Nov 05
Posts: 14
Credit: 3,090,462
RAC: 2
Message 112799 - Posted: 13 Jun 2025, 9:27:59 UTC
Last modified: 13 Jun 2025, 9:29:59 UTC

May I ask why adding these IP/Host assignments to the hosts file is necessary? Has it been communicated somewhere as a workaround for some issue with the project? Was there a change/issue in the BOINC client itself that requires this in some cases (I have to admit, mine isn't up to date, I update it very infrequently, and I don't follow the news related to it in terms of major changes or critical fixes) Or where is this information coming from?

Seeing as I have not received Rosetta tasks for at least half a year, I finally remembered to come to their site to take a look but have not seen anything obvious. Servers appeared to be up, tasks being available, so I checked the boards and found this one conversation about needing to edit the hosts file if the event log for Rosetta just keeps saying "feeder not running", but oddly enough, there appeared to be only a single person with this issue, and yet someone else giving solution as if it was public knowledge,

Before I made the edit, I checked the two hosts involved.

boinc-files.bakerlab.org resolved to 128.95.160.135
bwsrlv.bakerlab.org resolved to 2607:4000:406::160:156

Both were up and replying.

After making the edit, both resolved to the address specified in the hosts file and were also up and replying on that IP address. The log entries for Rosetta changed from "feeder not running" to

2025. 06. 13. 11:12:02 | Rosetta@home | Sending scheduler request: To fetch work.
2025. 06. 13. 11:12:02 | Rosetta@home | Requesting new tasks for CPU
2025. 06. 13. 11:12:03 | Rosetta@home | Scheduler request completed: got 0 new tasks
2025. 06. 13. 11:12:03 | Rosetta@home | No tasks sent
2025. 06. 13. 11:12:03 | Rosetta@home | Project requested delay of 31 seconds

which appears to be what to expect when communication with the servers is ok just no work is being sent by them for whatever normal reason.

The question remnains: why is this address resolve override needed, and how is it that apparently it is only needed for an extremely low number of users? What am I missing here?
ID: 112799 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1854
Credit: 18,534,891
RAC: 0
Message 112800 - Posted: 13 Jun 2025, 11:00:37 UTC - in response to Message 112799.  

The question remnains: why is this address resolve override needed, and how is it that apparently it is only needed for an extremely low number of users? What am I missing here?
Very few people use IPv6, and that is what is what is causing the issues as it's broken here at Rosetta.
Along with several other issues- such as millions of Tasks queued up but frequently 0 are Ready to send, issues with the download servers (requiring their own Host file fix if you have that issue), along with issues with the Assimilators.
And the project not taking the slightest bit of interest in resolving any of these problems.
Grant
Darwin NT
ID: 112800 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 158
Credit: 31,910,146
RAC: 104,988
Message 112801 - Posted: 13 Jun 2025, 12:20:56 UTC - in response to Message 112798.  
Last modified: 13 Jun 2025, 12:27:12 UTC


I hear everything you're saying, but C:/Windows/System32/drivers/etc/hosts is not being read
That's 'hosts' with no extension - not .bak .old .txt .doc or anything else, just hosts
And not in any other directory - specifically the folder written above

For whatever reason that none of us seems to understand, Rosetta won't magically come back just by waiting
Something changed somewhere. You <must> have had it right to get tasks to come down, then it <must> have changed to stop connecting
And whatever the file is that you're editing now simply cannot be the one in that very specific folder

I know I'm writing this from a distance like I know better than you, but if you've repeatedly put the lines you've been given in the right file in the right place you simply wouldn't keep on receiving the message "Server error: feeder not running".
You might get other messages saying all sorts of things, but not that.
That's a line that says I'm not looking at the hosts file you keep editing.


Is it possible that he has been running the notepad without Administrator permissions? And so has not actually successfully made those changes? Or there are "invisible" characters in the current file name so it is not being recognized?

I have had to literally wipe out/delete and recreate from scratch some windows text files that had garbage I could not see. In order to get something to work again.
Proud member of the O.F.A. (Old Farts Association)
ID: 112801 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 338 · 339 · 340 · 341

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2025 University of Washington
https://www.bakerlab.org