Ticket #12 (reopened defect)

Opened 4 years ago

Last modified 2 years ago

Client locations get corrupted when large numbers are monitored

Reported by: uncle_fungus Assigned to: uncle_fungus
Priority: major Milestone: 2.4.0
Component: Monitoring system Keywords:
Cc:

Description

When large numbers of clients are monitored, occasionally the location of a client will get corrupted and take on the value of a line from FAHlog.txt

I suspect this is caused by a file/memory locking issue that only becomes apparent when large numbers of clients are being updated at once.

It has never been observed whilst monitoring 5 clients, but has been reported when 60 clients are being monitored.

Attachments

Fahmon Errors.png (83.1 kB) - added by WFO on 04/25/09 22:43:39.

Change History

10/19/07 19:26:04 changed by uncle_fungus

  • status changed from new to assigned.

10/25/07 06:22:16 changed by uncle_fungus

  • milestone set to 2.3.2.

10/25/07 06:22:24 changed by uncle_fungus

  • milestone deleted.

10/25/07 06:25:27 changed by uncle_fungus

  • milestone set to 2.3.2.

11/14/07 18:40:09 changed by uncle_fungus

The recent commits (r112) may fix this problem, by removing the dodgy code to extract the PRCG. Since core code from qd is now used, this makes extraction of several pieces of data more reliable.

02/08/08 15:58:28 changed by uncle_fungus

  • status changed from assigned to closed.
  • resolution set to worksforme.

I'm going to close this as a "works for me" as no-one who's beta tested has managed to reproduce it.

06/30/08 22:48:04 changed by flecom

  • status changed from closed to reopened.
  • resolution deleted.

im getting this error, im monitoring 44 machines and while the program is running it randomly corrupts the paths in the clientstab.txt file

heres my FAHMon clientstab.txt after its been running for an hour or two... these were all valid paths to begin with obviously

# ./config/clientstab.txt : contains the list of clients # # "Name" "Location"

"HP1" "2605 (R6, C428, G68)" "HP2" "./config/benchmarks.dat" "HP3" "\\FOLD-CC74\c\etc\folding\1\" "HP4" "FLECOM" "HP5" "\\FOLD-85AB\c\etc\folding\1\" "HP6" "\\FOLD-67DA\c\etc\folding\1\" "HP7" "\\FOLD-4DB0\c\etc\folding\1\" "HP8" "\\FOLD-3C76\c\etc\folding\1\" "Workstation GPU" "Z:\Documents and Settings\Frank.FLECOM\Application Data\Folding@home-gpu\" "iMac" "y:\Program Files\Folding@Home Windows SMP Client V1.01\" "FAH-Server" "[17:44:54] Initial: 21E9; + 1454080 bytes downloaded" "Workstation SMP" "Z:\Program Files\Folding@Home Windows SMP Client V1.01\" "101-06" "\\FOLD-8131\c\etc\folding\1\" "101-01" "\\FOLD-6A29\c\etc\folding\1\" "101-02" "MainDialog?.KeepDeadClientsLast?" "101-03" "\\FOLD-E68D\c\etc\folding\1\" "101-05" "\\FOLD-D1F9\c\etc\folding\1\" "101-07" "\\FOLD-D0C1\c\etc\folding\1\" "101-08" "\\192.168.11.31\c\etc\folding\1\" "101-09" "\\FOLD-404C\c\etc\folding\1\" "101-10" "\\FOLD-BAD8\c\etc\folding\1\" "101-11" "\\FOLD-6AB8\c\etc\folding\1\" "101-12" "\\FOLD-E8F5\c\etc\folding\1\" "101-24" "\\FOLD-A3F6\c\etc\folding\1\" "101-25" "\\FOLD-7583\c\etc\folding\1\" "101-26" "\\FOLD-9A4F\c\etc\folding\1\" "101-27" "\\FOLD-D4DE\c\etc\folding\1\" "101-14" "\\FOLD-E90D\c\etc\folding\1\" "101-15" "\\FOLD-E91D\c\etc\folding\1\" "101-16" "\\FOLD-E61A\c\etc\folding\1\" "101-17" "\\FOLD-DF87\c\etc\folding\1\" "101-18" "\\FOLD-E917\c\etc\folding\1\" "101-19" "\\FOLD-E8B1\c\etc\folding\1\" "101-20" "\\FOLD-D0B6\c\etc\folding\1\" "101-21" "\\FOLD-C575\c\etc\folding\1\" "101-22" "\\FOLD-CD10\c\etc\folding\1\" "101-28" "\\FOLD-6C21\c\etc\folding\1\" "101-29" "\\FOLD-B632\c\etc\folding\1\" "101-30" "\\FOLD-6CBF\c\etc\folding\1\" "101-31" "\\FOLD-424F\c\etc\folding\1\" "101-04" "\\FOLD-E93B\c\etc\folding\1\" "745-B" "\\FOLD-4527\c\etc\folding\1\" "745-T" "\\FOLD-7409\c\etc\folding\1\" "755" "\\FOLD-5BF1\c\etc\folding\1\"

06/30/08 22:48:30 changed by flecom

im getting this error, im monitoring 44 machines and while the program is running it randomly corrupts the paths in the clientstab.txt file

heres my FAHMon clientstab.txt after its been running for an hour or two... these were all valid paths to begin with obviously

# ./config/clientstab.txt : contains the list of clients
#
# "Name"          "Location"

"HP1"    "2605 (R6, C428, G68)"
"HP2"    "./config/benchmarks.dat"
"HP3"    "\\FOLD-CC74\c\etc\folding\1\"
"HP4"    "FLECOM"
"HP5"    "\\FOLD-85AB\c\etc\folding\1\"
"HP6"    "\\FOLD-67DA\c\etc\folding\1\"
"HP7"    "\\FOLD-4DB0\c\etc\folding\1\"
"HP8"    "\\FOLD-3C76\c\etc\folding\1\"
"Workstation GPU"    "Z:\Documents and Settings\Frank.FLECOM\Application Data\Folding@home-gpu\"
"iMac"    "y:\Program Files\Folding@Home Windows SMP Client V1.01\"
"FAH-Server"    "[17:44:54] Initial: 21E9; + 1454080 bytes downloaded"
"Workstation SMP"    "Z:\Program Files\Folding@Home Windows SMP Client V1.01\"
"101-06"    "\\FOLD-8131\c\etc\folding\1\"
"101-01"    "\\FOLD-6A29\c\etc\folding\1\"
"101-02"    "MainDialog.KeepDeadClientsLast"
"101-03"    "\\FOLD-E68D\c\etc\folding\1\"
"101-05"    "\\FOLD-D1F9\c\etc\folding\1\"
"101-07"    "\\FOLD-D0C1\c\etc\folding\1\"
"101-08"    "\\192.168.11.31\c\etc\folding\1\"
"101-09"    "\\FOLD-404C\c\etc\folding\1\"
"101-10"    "\\FOLD-BAD8\c\etc\folding\1\"
"101-11"    "\\FOLD-6AB8\c\etc\folding\1\"
"101-12"    "\\FOLD-E8F5\c\etc\folding\1\"
"101-24"    "\\FOLD-A3F6\c\etc\folding\1\"
"101-25"    "\\FOLD-7583\c\etc\folding\1\"
"101-26"    "\\FOLD-9A4F\c\etc\folding\1\"
"101-27"    "\\FOLD-D4DE\c\etc\folding\1\"
"101-14"    "\\FOLD-E90D\c\etc\folding\1\"
"101-15"    "\\FOLD-E91D\c\etc\folding\1\"
"101-16"    "\\FOLD-E61A\c\etc\folding\1\"
"101-17"    "\\FOLD-DF87\c\etc\folding\1\"
"101-18"    "\\FOLD-E917\c\etc\folding\1\"
"101-19"    "\\FOLD-E8B1\c\etc\folding\1\"
"101-20"    "\\FOLD-D0B6\c\etc\folding\1\"
"101-21"    "\\FOLD-C575\c\etc\folding\1\"
"101-22"    "\\FOLD-CD10\c\etc\folding\1\"
"101-28"    "\\FOLD-6C21\c\etc\folding\1\"
"101-29"    "\\FOLD-B632\c\etc\folding\1\"
"101-30"    "\\FOLD-6CBF\c\etc\folding\1\"
"101-31"    "\\FOLD-424F\c\etc\folding\1\"
"101-04"    "\\FOLD-E93B\c\etc\folding\1\"
"745-B"    "\\FOLD-4527\c\etc\folding\1\"
"745-T"    "\\FOLD-7409\c\etc\folding\1\"
"755"    "\\FOLD-5BF1\c\etc\folding\1\"}}}

07/01/08 08:07:45 changed by flecom

btw im using 2.3.3svn r296

but this happened with 2.3.2 also

07/01/08 12:32:30 changed by uncle_fungus

Thanks. That's quite helpful. I can at least now see what kind of things are being inserted into the client location field. Mostly it seems to be anything stored as a string (and I can see some strange ones in there). This has to be a pointer/reference problem, although as yet I'm not sure what code is triggering it.

04/19/09 06:14:51 changed by snapshot

I've seen this when monitoring 17 clients but I've just had the client name change rahter than the location. Using FahMon? 2.3.99.1 under XP64. Clientstab.txt appended. The penultimate entry has changed from "test-5" to "25 May, 17:54".

Hope this helps.

Jonathan

# "Name" "Location" Disabled(*) VM(*)

"this-smp" "C:\Program Files (x86)\Folding@Home Windows SMP Client V1.01\" "this-smp2" "D:\Program Files\Folding@Home Windows SMP Client V1.01\" "8336-1" "\\Fold-8336\C\etc\folding\1\" * "that-gpu0" "\\That\Folding\gpu1\" * "this-gpu0" "D:\Program Files\folding\gpu0\" "this-3" "D:\Program Files\folding\3\" "this-4" "D:\Program Files\folding\4\" "this-5" "D:\Program Files\folding\5\" "test-gpu0" "\\Test\Folding\gpu0\" "test-gpu1" "\\Test\Folding\gpu1\" "test-smp" "\\test\fah\" "that-fah2" "\\that\folding\fah2\" * "this-6" "D:\Program Files\folding\6\" "test-3" "\\Test\Folding\3\" "test-4" "\\Test\Folding\4\" * "25 May, 17:54" "\\Test\Folding\5\" * "9758-1" "\\Fold-975d\C\etc\folding\1\"

04/25/09 22:42:51 changed by WFO

I'm using version 2.3.99.1. I'm not experiencing "location changes. I am having client names corrupted. In the first picture under AMD 3 note the FAHlog.txt

[URL=http://img4.imageshack.us/my.php?image=fahmonerrors.png][IMG]http://img4.imageshack.us/img4/7086/fahmonerrors.png[/IMG][/URL]

[URL=http://img4.imageshack.us/my.php?image=fahmonerrors1.png][IMG]http://img4.imageshack.us/img4/2170/fahmonerrors1.png[/IMG][/URL]

04/25/09 22:43:39 changed by WFO

  • attachment Fahmon Errors.png added.

04/30/09 10:35:40 changed by uncle_fungus

  • milestone changed from 2.3.2 to 2.4.0.

01/18/10 14:53:41 changed by legoman666

I too have noticed this in version 2.3.99.1. However, I only have 8 clients being monitored. 4 local and 4 remote. Both the client name and client path become corrupted.