Let me start from the last question. The file I get now is around 500MB. It's not the overall events nor necessarily for one events. What I meant is that if I have 1000 trials, i only find the data for 500 of that. Sorry for the confusion. Finally, I am using Matlab.
For example, I have a variable that's assigned at the beginning and end of each trial (start and exit states) to make data extraction easier. That gives the smaller number. Aside from this, when I check the success, failure variables that are used in the behavioural window, I also find the total number to be a fraction from the trials during the experiment.
Sorry for the late reply; I didn't have access to the files. I checked the time stamps of one variable (the stim updates for example) and I found that the elapsed time between the first and last event is roughly 30 minutes or so. From the experiment file I attached, you can see that the streaming started around 14:40 and the stopped at 16:30 (file last modified).
Below is the link for the file (let me know if it doesn't work). The variable is "afcc_onset_trial"; it's assigned to 0 after fixation of the animal and to 1 at the exit state. There are trials with fixation break which aren't counted in the behavioural window during the experiment and that I can exclude from my data after extraction by looking at the variable "fix_break". https://newcastle-my.sharepoint.com/:u:/g/personal/nba41_newcastle_...
Thanks for this. I have updated mworks to the latest nightly version and had another session today. I split the data into two files each for about an hour long. I again looked at the stim updates time stamps and found the same issue. One file ran for an hour and only found timestamps for 22 minutes and the other ran for an hour and a half and found timestamps for 26 minutes.
Looking at the data file you shared, I see a gap of about 72 minutes between the RESERVED_TERMINATION_CODE event (the final event written to the file before it's closed) and the preceding event. This suggests that the MWorks thread responsible for writing events to the data file stopped receiving new events at some point, but did not otherwise fail (since it was still able to write the termination event). I'm trying to understand how this could be the case.
A few more questions:
Are you seeing any error messages in the MWorks console while your experiments are running? (There were none in the data file.)
Are you using any custom core plugins (e.g. for I/O devices or stimuli)? If yes, have you re-compiled them against the version of MWorks you're running (always required when you upgrade MWorks)?
Thanks for the reply. I do get a warning and it's related to the eyelink plugin if I understand correctly; here it is:
WARNING: Scheduled task (/Users/cstawarz/Documents/mworks_buildbot/worker/build_for_macos/build/plugins/core/EyeLink1000/Eyelink.cpp:342: eye_tracker) falling behind, dropping 28 scheduled executions (priority = 2, interval = 1000, task = 0x600ed3ebcda0)
The IO devices that I am using are the ones for Eyelink adn NIDAQ; how can I recomplie them?
By "custom plugins", I meant plugins that aren't included in the nightly build, which you built and/or installed yourself. I presume you're just using the EyeLink and NIDAQ plugins that came with the nightly build, correct? If so, then those are always up to date.
I asked about custom plugins, because using a plugin compiled for an old MWorks build with a newer MWorks build can lead to memory corruption. The symptoms of that are usually quite obvious (e.g. the server crashes, or the plugin fails to load), but I wanted to check.
Is your MWorks data file directory ($HOME/Documents/MWorks/Data) located on a local drive, i.e. internal or attached via USB or Thunderbolt? Or is it on a network-attached drive (presumably mapped in to your home directory via a symbolic link)?
The time stamps in your event file indicate that MWServer had been running for over 13 days. Is that correct? If so, that's impressive :)
In the MWorks data directory, are there any -journal files associated with the data files that are missing events (e.g. run1.mwk2-journal for run1.mwk2)? Normally, these should be present only when the data file is open and removed when it's closed.
It's the directory that you described, we didn't change the default one. Instead, we copy the files afterwards. As the computer is always running, we don't really have a reason for disconnecting the server. I will try closing the connection and seeing what happens; maybe some funny business is causing this issue :) .
I saw the journal file you describe and indeed it disappears once the connection is closed. I have also attached the output of the terminal command.
As the computer is always running, we don't really have a reason for disconnecting the server. I will try closing the connection and seeing what happens; maybe some funny business is causing this issue :) .
I mostly asked as a sanity check. Also, while there's no reason that you shouldn't be able to run MWServer for that long, I'd never seen it done before, hence my interest.
In any case, I don't think this issue could be related to the length of time MWServer has been running, because you just reproduced it with the latest night build. One question: Did you observe the data loss on the first run after launching MWServer? Or did it only begin after several runs?
I have also attached the output of the terminal command.
Thanks for that. The only thing that catches my eye there is that your iMac is running macOS 10.14 (aka 10.14.0) rather than 10.14.4. Apple's scant release notes don't note any bug fixes that seem relevant to this problem. However, just to cover all the bases, could you try upgrading to 10.14.4?
Also, have you observed the data loss problem on any other machines?
On my end, I tried running a long data collection on my Mac Pro, using a slightly-modified version of the RSVPDemo experiment that ships with MWorks. I ran for over an hour, ultimately generating over 800MB of data. During the run, the MWK2 file steadily grew in size, and later analysis didn't reveal any missing events. This is only one data point, of course, but it at least establishes that things work as intended some of the time.
As a next step, I'm going to prepare a special build of MWorks that does detailed logging of what's happening in the data-file writing code. I'll then run this on all of my test machines, and I'll ask you to install and run it on your iMac. Hopefully this will provide some insight in to the issue.
On my end, I'm about an hour into running this build on my three test Mac's, and I haven't seen any problems so far.
I just stopped this test. After running for over four hours, all three Mac's had generated MWK2 files over 2GB in size. Right up to the point when I stopped the experiment, the event files were steadily growing. The logs indicated no problems whatsoever -- just a steady stream of events successfully written.
Sorry for the late reply. I ran the experiment on Friday using the Mworks version you sent. It ran for over an hour and a half; even though the file size is larger than before, I think the issue is still there. I have checked the stim updates and it shows a run time of 30 mins or so. I have included a link of the file for you to check. Additionally, I would like to know if I can get the time on the experiment computer from the system variable.
I have included a link of the file for you to check.
Thanks, but what I really need is the special log file: /tmp/mwserver_event_file_log.txt. It's overwritten every time you open an event file, so unless you haven't run any experiments since you generated the last faulty data file, I'll need you to generate another one. Sorry for not being clearer.
Thanks for sending the log file. It has shed considerable light on the problem.
It appears that the data-file writing code isn't losing track of any events, and it keeps on writing them to disk right up until the point that you close the data file. However, at some point in the experiment, the rate at which events are written falls drastically below the rate at which events are generated, producing an enormous backlog (e.g. 30 minutes or more) of events. When you tell MWorks to close the data file, the data-file writing code stops as soon as possible, and any events in the backlog are never written to disk.
So there are two problems here:
At some point, something is causing the rate at which events are written to the event file to plummet.
MWorks isn't handling the backlog of unwritten events produced by problem 1.
Problem 2 is certainly fixable on my end. I'm working on that now, and I'll get an updated build to you ASAP.
Problem 1 is a bit of a mystery. I can think of three explanations why a backlog of events might be building up:
The experiment is simply generating way more events per unit time than can be written to disk in that time.
The CPU is overloaded, so the data-file writing code isn't running as often as needed.
The actual writes to disk are taking an unreasonably long time.
Looking at your data file, I see no support for explanation 1. Your experiment seems to generate a perfectly reasonable number of events per unit time.
I also find explanation 2 dubious. If the CPU were really that overloaded, you would probably be noticing it in myriad ways (e.g. long application opening times, slow user interfaces). Unless that has been your experience, I'm inclined to rule this out, too.
That leaves us with explanation 3. If disk I/O is taking an exceptionally long time, then I'd expect that either
other processes are also doing a lot of disk I/O, causing a lot of contention and slowing things down for everyone, or
there's a physical problem with the disk.
You can get a sense of the system's disk I/O activity by opening the Activity Monitor application and switching to the "Disk" tab. If you see huge numbers by "Data read/sec" or "Data written/sec", then that may be the underlying issue.
On the other hand, if the issue is with the health of the disk, then I'm not sure how best to diagnose that. I suppose you could try running Disk Utility's "First Aid" procedure on it, as that may find some problems.
As I said, I'm working on a new MWorks build for you to try. In this build, I'll include some additional logging that will hopefully clarify why the event backlog is happening. I'll let you know as soon as I have it ready.
Thank you for your patience and assistance in debugging this issue!
As with the previous build, this one logs details of the event-file writing process to the file /tmp/mwserver_event_file_log.txt. However, the log includes more details than before.
With this build, your event files should not be missing any events. (If they are, then I've done something wrong.) However, there should still be an event backlog. The evidence of this will be in the server console. Upon closing the event file, two messages will be logged to the console. For example:
00:00:18: Closing data file...
00:00:18: Closed data file: /Users/cstawarz/Documents/MWorks/Data/test.mwk2
The crucial point is that the first message is logged before the backlog is cleared, while the second message is logged afterward. Under normal circumstances, there should be no elapsed time between the two messages (as shown above). However, in the presence of an event backlog, there will be a substantial amount of time between them (e.g. 30 minutes or more, depending on how long the experiment ran and when the backlog started).
Therefore, be sure to wait for both messages before closing MWServer. If you don't, you will lose backlogged events (and may corrupt the event file). You may have to wait a long time, so be patient.
Please install this build on the machine that's had data loss and try running your experiment again. If you observe an event backlog (signaled by a long delay between the two messages as described above), please send me the special log file and, if possible, the event file.