Ticket #17 (closed defect: worksforme)

Opened 5 years ago

Last modified 3 years ago

fahmon.exe runs at high CPU utilization

Reported by: lelliott731 Assigned to: uncle_fungus
Priority: blocker Milestone: 2.3.2
Component: Other Keywords:
Cc:

Description

It has been happening to me at random times, but on all of my XP machines. I will go check to see how the processes are running, I have a few quad core machines that run 3 single core clients, and one GPU client, and so I just check up on them every once in a while to see that the GPU client is geting it's fair share of CPU resources, and I started noticing that fahmon.exe will be taking up 25%-40% of CPU resources. So then I started checking some of my dual core & single core systems, and all of them too, randomly start having serious slowdown's in F@H speed, and I go and check, and there it is, fahmon.exe running at 25% minimum, full time.

As soon as the next time I see it, I'll post a screen capture, that way you know I'm not crazy:-) All the machines are constantly updated to the latest Windows Updates, all have Anti-Virus protection, etc, so I know it's not something pretending to be fahmon.exe, so, I'll get back to this ticket as soon as I can.

Attachments

fahmon.jpg (135.0 kB) - added by lelliott731 on 10/27/07 14:53:23.
Screen Capture of the Task Manager in Windows XP SP2
fahmon_max.jpg (121.0 kB) - added by lelliott731 on 10/30/07 10:11:42.
From my main work desktop system
fahmon_max_2.jpg (160.0 kB) - added by lelliott731 on 10/30/07 10:12:11.
From my main work desktop system a couple hours later
fahmon-lastbuild-cpu.png (3.4 kB) - added by shello on 11/20/07 17:46:34.
FahMon? 100% CPU Usage (SVN Build)
FahMon-231.zip (33.2 kB) - added by shello on 11/20/07 17:47:25.
Zip with the config directory
famon_max_v2-3-2b.jpg (159.7 kB) - added by lelliott731 on 06/17/08 14:51:30.
FahMon? v2-3-2b Running at 100% CPU utilization

Change History

10/26/07 15:30:02 changed by uncle_fungus

  • status changed from new to assigned.

10/26/07 15:32:40 changed by uncle_fungus

Can you try resetting all your preferences to defaults (renaming prefs.dat is the easiest way) and see if the behaviour still occurs.

If it does, can you also try running a previous version of FahMon, like 2.2.2 to see if the problem is cause by a regression introduced in 2.3.0

10/27/07 14:53:23 changed by lelliott731

  • attachment fahmon.jpg added.

Screen Capture of the Task Manager in Windows XP SP2

10/27/07 15:57:17 changed by lelliott731

As you can see from the attachment I just put up there, I just found it on my home machine, which is configured as follows: Intel® Core™2 Quad Q6600 CPU

3 copies of Folding@Home Console Client v6.00 Beta 1 1 copies of Folding@Home GPU GUI Client v5.91 Beta 6

2GB 2x1GB DDR2 800MHz OCZ RAM 2x500GB 7200rpm 16MB Cache SATA Hard Drives on Intel ICH8R ATI X1950Pro 512MB VRAM PCIe x16

Windows XP SP2 with all updates Latest Intel chipset & RAID drivers ATI Catalyst 7.9 Drivers

In terms of the preferences I had set: Tab 1: General

Enable system tray icon - checked Collect .xyz files - checked Auto update projects database when needed - checked Always list inaccessible clients last - checked Start minimized - checked Show deadlines and download times in days - not checked

Tab 2: Monitoring

Auto reload clients - checked Reload interval (minutes): 1 Use experimental reload system - not checked Ignore asynchronous clocks - checked

Tab 3: Networking

Use a proxy for HTTP connections - not checked

Tab 4: Advanced

Use the following settings for new project downloads - checked

Server: fah-web.stanford.edu Port: 80 Resource: psummary.html

Use a local file for project data - not checked

Tab 5: System

Nothing is changed from default

I have followed those instructions above and I'll report back soon.

If I remember correctly it did occur sometimes back in 2.2.2, I just didn't know how to report it, and kept forgetting about it. But as you can see now, it's seriously slowing down my machines, having 1 core of my CPU's tied up on each machine for 16 hours till I'm able to get to the machine and shut down fahmon.exe and restart it, seriously reducing my ability to help the project, not to mention my points:-)

10/30/07 10:11:42 changed by lelliott731

  • attachment fahmon_max.jpg added.

From my main work desktop system

10/30/07 10:12:11 changed by lelliott731

  • attachment fahmon_max_2.jpg added.

From my main work desktop system a couple hours later

11/01/07 01:15:43 changed by lelliott731

Ok, so I did what you said, and renamed the file prefs.dat to prefs-old.dat, restarted fahmon.exe version 2.3.0 and it came up

just fine. I changed the preference to have it run minimized, and that was it. Since I've done that, it has yet to do a freak

out where it takes complete control over one of the cores of my Intel® Core™2 Quad Q6600 CPU. I have also since upgraded to the

2.3.1 version of fahmon.

But, I have been experiencing it on another machine, with almost the exact same configuration, that's at my office, so I only

get to see it M-F, so if fahmon.exe goes nuts, it can waste 48 hours at a time of processing power. Here is the configuration of

the machine: Intel® Core™2 Extreme QX6700 CPU

3 copies of Folding@Home Console Client v6.00 Beta 1 1 copies of Folding@Home GPU GUI Client v5.91 Beta 6

4GB 4x1GB DDR2 800MHz OCZ RAM 2x500GB 7,200rpm 16MB Cache SATA Hard Drives, 2x250GB 7,200rpm 16MB Cache SATA Hard Drives &

2x150GB 10,000rpm 16MB Cache SATA Hard Drives all on a Intel ICH8R, ATI X1950XTX 512MB VRAM PCIe x16

Windows XP SP2 with all updates Latest Intel chipset & RAID drivers ATI Catalyst 7.9 Drivers

Now you can see via the above pictures that once again, version 2.3.0 of fahmon.exe has been taking up an entire core of the

quad core CPU, and other things are occurring, when I first noticed it, picture fahmon_max.jpg, after fahmon.exe had been running at 100% of a single core for 9 hours & 52 minutes, it had spawned 3 threads and was taking up 21MB of RAM, where when it's running normal, at the most I've seen it take up 1.5-2MB of RAM. Then, on the second picture, fahmon_max_2.jpg, you can see that it's jumped to 5 threads, it's now at 16 and a 1/2 hours of fully using the one core of my quad core processor, and the memory utilization has gone up big time, almost getting close to doubling it from 9 hours, to 45MB, plus it's using almost 60MB of Virtual memory.

So I have yet to try to redo the preferences file at this office machine, I'm going to try that tomorrow. I am also going to

upgrade to 2.3.1 and see if these things continue.

Hope all this information helps get this problem resolved. This tool is really helpful and I like to use it, but until we can

figure out why it's randomly doing this, I've stopped running it on most of my boxes just because I don't want them sitting

there wasting time & cycles on monitoring itself.

Thanks uncle_fungus for all the time & effort you're putting into this program!!!

11/01/07 05:43:21 changed by uncle_fungus

  • priority changed from major to blocker.

Ok, the information about threads is probably important, as it sounds like one of the monitoring threads is going mental and continuously processing "something".

The memory usage in those screenshots is just stupid, as you've noted above, and I suspect it's related to the awry threads.

Normally, when not processing an update, fahmon should only be in 1 thread. When it updates clients, it spawns a thread per client, which should only exist whilst the update is being run.

Can you check the messages.log file, does it contain anything remotely useful (like updates during the 100% cpu usage period). Also, does fahmon continue to monitor the clients correctly whilst running at 100% CPU? If it does continue, it means that one (or more) of the update threads have gone mental and can't quit.

I'm going to bump this up to a blocker bug level, but ideally I need more info about what exactly is going on. I'm going to try running this for an extended period of time inside a Windows VM, because as far as I can tell, this has never happened on Linux.

Another thought, have any of the machines monitored by FahMon? ever dropped off the network, even for a short period of time?

11/01/07 18:43:46 changed by lelliott731

Hey uncle_fungus, both of those machines are directly connected to a network, so they wouldn't be dropping on or off of the network.

I had no idea there was a messages.log file, I can start looking at it at both of those machines and see what it's saying.

Also, I do not know if the clients are being monitored correctly while fahmon is going nuts. I'll watch for that too, and report back to you as soon as I have both pieces of information.

Luke

11/02/07 14:18:25 changed by Bastien

I've been testing with this problem for a couple of minutes...

I see that your reload interval is set to 1 minute. Do you have tried this with an interval of 5-10 minutes? I see that when FahMon? is reloading all the clients at one time (I have 8 clients), that it is busy for 20 seconds (in taskman FahMon? claims one core for 100%, the other three cores are folding).

I have this kind of problems also with a Linux-script (no F@H), that when the previous run is not fully done, the next one makes my system crazy. A check in this script if the previous run was done before starting a new one does the trick.

11/02/07 14:52:41 changed by uncle_fungus

Hmm, there may be some dodgy mutex handling going on, which locks causes a thread trying to access a locked function call to use 100% CPU. I know there is definitely a portion of code that is missing a mutex, but I think that causes another problem rather than this one. There are a couple of fairly major intermittent bugs, like this which I really need to fix. I may just do a feature freeze on 2.3.2 and work solely on code refactoring and bug fixing.

11/05/07 13:01:17 changed by uncle_fungus

  • milestone set to 2.3.2.

11/14/07 18:58:27 changed by uncle_fungus

This may have been fixed in r112. A prerelease build will have to confirm this.

11/18/07 17:04:34 changed by uncle_fungus

OK, I've built some pacakges from the latest SVN revision to test this: http://fahmon.net/downloads/testing/FahMon-SVNr119.zip (Win32) http://fahmon.net/downloads/testing/FahMon-SVNr119.tar.bz2 (Linux)

11/19/07 10:03:22 changed by Bastien

I have running the latest (Win32) revision for 12 hours now and it is looking good. Refreshing 9 clients is done within 1 second and the CPU load is a couple of % (max. 5-10).

11/20/07 17:46:00 changed by shello

I've overwritten the exe from 2.3.1 with this new build (Win32), and I'm still having this problem.

Creating a "new install" using this build solve this problem. I think the problem may be in any config file or something.

I've attached a screenshot (this is an P4HT, so 50% means 100%) and a zip with the config files.

11/20/07 17:46:34 changed by shello

  • attachment fahmon-lastbuild-cpu.png added.

FahMon? 100% CPU Usage (SVN Build)

11/20/07 17:47:25 changed by shello

  • attachment FahMon-231.zip added.

Zip with the config directory

02/08/08 15:59:29 changed by uncle_fungus

  • status changed from assigned to closed.
  • resolution set to worksforme.

I'm going to close this as "works for me" as no-one who's beta tested has reproduced the bug (or at least they haven't told me about it).

05/27/08 12:42:14 changed by lelliott731

  • status changed from closed to reopened.
  • resolution deleted.

I hate to do this to you uncle_fungus, but I'm having issues again. On one AMD Athlon64 X2 machine, in 8 days, the fahmon.exe used up almost 2 hours of processing. And on my Core 2 Extreme machine with a Radeon 3870, in 12 days it used 28 hours of processing. Unfortunately I wasn't able to keep screenshots, but I swear I'm not making this stuff up. The Core 2 Extreme machine was keeping one whole core saturated with just the fahmon.exe process, and I forget how many threads it had spawned. I am moving all my configurations down to only checking every 5 minutes, to see if that alleviates any of the issues from happening, and I'm going to keep quite a close tab on all of this, so that I can report exact information to you, with screenshots, etc.

06/10/08 15:39:03 changed by uncle_fungus

Note: additional details in this (closed) ticked #121

06/17/08 14:51:30 changed by lelliott731

  • attachment famon_max_v2-3-2b.jpg added.

FahMon? v2-3-2b Running at 100% CPU utilization

06/17/08 14:54:51 changed by lelliott731

Ok, so I finally screen captured it with the current version. It had been running for approximately a week, which means it was averaging 81% of one of the CPU's resources the entire time.

The system is a Dual Socket, AMD Opteron 285 (2.6GHz X 2, 1MB L2 Cache X 2), 6GB DDR-400 RAM, and a RAID 5 array. It's running Windows Server 2003 Enterprise x86.

06/17/08 15:18:14 changed by uncle_fungus

OK, that looks interesting. It's spawned 4 threads, which means at least one of the monitoring has locked, possibly due to a race condition.

Is FahMon? still responsive and monitoring correctly when this happens, and how many clients is it monitoring?

01/02/09 19:58:24 changed by uncle_fungus

  • status changed from reopened to closed.
  • resolution set to worksforme.