ehFeedback error

llglick's Avatar

llglick

09 Jul, 2021 08:16 PM

Hi Chris,
We've been sporadically getting the attached error (ehFeedback error, see stdout), and shortly after mWorks crashes.
Do you have thoughts on what might be causing it? Anything we can do other than restarting the computer/labjack?
Thanks,
Lindsey.

Showing page 2 out of 2. View the first page

  1. 31 Posted by mhisted on 12 Jan, 2022 02:10 PM

    mhisted's Avatar

    I’ll ask if anyone has seen this on my end.

  2. 32 Posted by llglick on 31 Jan, 2022 06:39 PM

    llglick's Avatar

    Hi Chris,

    Do you have any new thoughts on this error?

    We've also been getting this (or a similar) error when using a different experiment. It only happens sporadically, and unlike the other experiment I haven't figured out the conditions that cause it. This experiment only requires triggers sent from a pulse generator to run- so it will potentially be easier to replicate.

    I've attached a screenshot of the error in the console window and the .xml that I was using.

    Thanks,
    Lindsey

  3. Support Staff 33 Posted by Christopher Sta... on 01 Feb, 2022 03:04 PM

    Christopher Stawarz's Avatar

    Hi Lindsey,

    Apologies for not following up on this sooner.

    This looks like a different issue from the one we fixed previously. The old issue led to errors when the plugin attempted to receive a response from the device. Based on the messages written to the server console and stdout, it looks like the new issue is due to a failure to send data to the device. Specifically, a call to LJUSB_Write is failing, which causes ehFeedback to fail, which causes ljU6ReadPorts to fail, which causes readLeverDI to call stopDeviceIO, which ultimately leads to more LJUSB_Write failures and more error messages. It seems like, after the first write failure, all subsequent write attempts also fail.

    At the moment, I don't have any theories as to why writes to the device start failing. I'm going to try running the experiments you sent me, in the hope that I can reproduce the problem.

    On your end, can you try running the failing experiment with LJ_DEBUG set to 1? To do this, edit line 47 in labjackusb.c to look like this:

    #define LJ_DEBUG 1
    

    then rebuild and reinstall your plugin. This will lead to a lot more messages on stderr. When the write failure happens, I'm expecting you'll see a message like

    LJUSB_DoTransfer: Transfer timed out. Returning.
    

    or

    libusb error: LIBUSB_ERROR_IO
    

    or some other message starting with "libusb error". If you can capture those and any subsequent stderr messages and send them to me, they may provide additional insight into what's going wrong.

    Thanks,
    Chris

  4. 34 Posted by llglick on 01 Feb, 2022 05:14 PM

    llglick's Avatar

    Hi Chris,
    So, now I have a crash with a different error and a different experiment.
    Here are the logs from the terminal server and the console.
    Lindsey.

  5. Support Staff 35 Posted by Christopher Sta... on 01 Feb, 2022 09:39 PM

    Christopher Stawarz's Avatar

    Hi Lindsey,

    Thanks for sending the logs. It looks like the error is once again occurring in readLeverDI. However, there are two differences:

    1. The failure happens in LJUSB_Read, instead of LJUSB_Write (so it's similar to the old issue in that regard).
    2. The connection seems to recover, because the subsequent attempts to shut down the device don't produce further errors.

    The stderr output contains exactly two error messages, both in response to the same LJUSB_Read failure:

    LJUSB_DoTransfer: Transfer timed out. Returning.
    ehFeedback error : response buffer is too small
    

    So the read is timing out (after the default timeout interval of 1 second). ehFeedback neither detects nor handles the timeout and just treats it like a read failure ("response buffer is too small"). Given that subsequent attempts to communicate with the hardware succeed, it seems like retrying the read would probably succeed.

    None of this explains why the read is timing out in the first place. One second is likely much longer than any normal sort of USB I/O delay, so it seems like something must be going wrong. The old issue was due to multiple, overlapping attempts by the plugin to communicate with the device. I know I went through the plugin code looking for other spots where that might happen, but maybe I missed something. I'll take another look.

    FYI, I ran your VisStimRet experiment for several hours, using a simulated counter input, and I never saw any errors.

    Cheers,
    Chris

  6. 36 Posted by llglick on 01 Feb, 2022 09:52 PM

    llglick's Avatar

    Hi Chris,
    Thanks for the update.
    I am not surprised you haven't been able to replicate the error on
    VisStimRet yet- it has only happened twice so far. We have now created a
    desktop command to open the Server through terminal so I'll be able to send
    you full readouts of future errors.
    Do you have any other suggestions moving forward?
    Thanks,
    Lindsey.

  7. 37 Posted by llglick on 02 Feb, 2022 07:57 PM

    llglick's Avatar

    Hi Chris,
    I had another crash today. I've attached the error logs and the mwel code
    for this experiment. It runs like VistStimRet (with pulses needed to drive
    transitions between task states).
    Lindsey.

  8. Support Staff 38 Posted by Christopher Sta... on 02 Feb, 2022 09:49 PM

    Christopher Stawarz's Avatar

    Hi Lindsey,

    Thanks for sending the additional logs. This looks like another occurrence of the new issue (LJUSB_Write failure). Instead of a timeout, the first write failure produces this error message:

    libusb error: LIBUSB_ERROR_IO
    

    and the subsequent write failures produce:

    libusb error: LIBUSB_ERROR_PIPE
    

    Looking at the libusb source code, it seems like LIBUSB_ERROR_IO could indicate a lot of different failure modes, but LIBUSB_ERROR_PIPE mostly seems to indicate that the "endpoint halted", which probably means the connection is dead, more or less. This supports the idea that communication with the device has failed for some reason, but it doesn't tell us whether that's due to a hardware failure or a software issue.

    I went through your plugin code again today. I found one spot where multiple, overlapping attempts to communicate with the device could occur. However, it was in the initialize() method, which gets called exactly once when the experiment is loaded. It seems unlikely that an error there could lead to a failure much later in your experiment, but I guess it's possible. I'll send you a pull request with the fix, but I don't expect it will fix this issue.

    As for what to do next: At this point, I think we have all the info we're going to get from setting LJ_DEBUG to 1, so you can set it back to 0. If it's a software issue, and I can reproduce it while running in the debugger, then I might be able to figure out what's happening. So far, though, I can't reproduce it at all. Maybe it's worth sending the terminal error messages to the LabJack folks to see if they have any insight?

    Chris

  9. 39 Posted by llglick on 02 Feb, 2022 10:19 PM

    llglick's Avatar

    Hi Chris,
    I have reverted the LJ_DEBUG to 0 and I just pinged Labjack and will see
    what they say.
    The initial read error was very closely associated with updating to the
    most recent version of MWorks. The write errors are more recent, but could
    also be associated with the update given that they are much rarer. Is there
    anything that changed in MWorks that could explain this?
    Lindsey.

  10. Support Staff 40 Posted by Christopher Sta... on 02 Feb, 2022 10:20 PM

    Christopher Stawarz's Avatar

    At this point, I think we have all the info we're going to get from setting LJ_DEBUG to 1, so you can set it back to 0.

    Actually, before you do that, there's one more thing you could try. If you launch MWServer from Terminal with the following command:

    LIBUSB_DEBUG=3 /Applications/MWServer.app/Contents/MacOS/MWServer
    

    then, in addition to the LabJack debugging messages, libusb errors and warnings should be written to stderr. Maybe you can try running that way until you get another failure, then send me the output (which hopefully will have even more specific info about what failed)?

    Thanks,
    Chris

  11. 41 Posted by llglick on 02 Feb, 2022 10:23 PM

    llglick's Avatar

    Ok- I can try that.
    However- is it possible that the extra overhead of the debugger is
    triggering some of these crashes?
    Lindsey.

  12. Support Staff 42 Posted by Christopher Sta... on 02 Feb, 2022 10:32 PM

    Christopher Stawarz's Avatar

    The initial read error was very closely associated with updating to the most recent version of MWorks. The write errors are more recent, but could also be associated with the update given that they are much rarer. Is there anything that changed in MWorks that could explain this?

    Hmm. MWorks 0.10 added the DATAPixx device, which also uses libusb. I see that libusb creates some "worker" threads when MWServer launches, even before an experiment is loaded. I guess it's possible that having two plugins both embedding libusb could lead to conflicts.

    If you want to test this, remove the DATAPixx plugin. It's at

    /Library/Application Support/MWorks/Plugins/Core/DATAPixx.bundle
    

    Then restart MWServer.

    That's the only connection I can think of because the inner workings of your LabJack plugin and anything else in MWorks.

    However- is it possible that the extra overhead of the debugger is triggering some of these crashes?

    You aren't running a debugger. You're just producing a lot of output on stderr. I can't imagine how that could cause USB I/O failures.

    Chris

  13. 43 Posted by llglick on 02 Feb, 2022 10:36 PM

    llglick's Avatar

    Ok- I'll try deleting the DATAPixx.bundle
    Thanks,
    Lindsey.

  14. Support Staff 44 Posted by Christopher Sta... on 03 Feb, 2022 02:13 PM

    Christopher Stawarz's Avatar

    Hi Lindsey,

    I submitted a pull request with the changes I mentioned. I haven't seen any failures while running twoStim_2P_Frames.

    Chris

  15. 45 Posted by llglick on 04 Feb, 2022 10:02 PM

    llglick's Avatar

    Hi Chris,
    Your changes in the pull request seem to have solved the original
    ehFeedback error (the read failure).
    We'll see if it also solves the write failure issues (deleting the DATAPixx
    bundle did not).
    Thanks,
    Lindsey.

  16. 46 Posted by llglick on 06 Feb, 2022 06:23 PM

    llglick's Avatar

    Hi Chris,
    Unfortunately, it did not solve the write error with VistStimRet.
    I've attached the error logs.
    Lindsey.

  17. Support Staff 47 Posted by Christopher Sta... on 08 Feb, 2022 10:20 PM

    Christopher Stawarz's Avatar

    Hi Lindsey,

    Your changes in the pull request seem to have solved the original ehFeedback error (the read failure).

    I'm surprised but pleased to hear it!

    We'll see if it also solves the write failure issues (deleting the DATAPixx bundle did not).

    OK. If nothing else, it's good to know that using libusb in multiple plugins isn't an issue.

    Unfortunately, it did not solve the write error with VistStimRet. I've attached the error logs.

    Thanks for sending those, as they do include a little new information. Here's the relevant portion of the stderr log:

    Calling LJUSB_SetupTransfer with count = 16 and operation = 0.
    Calling LJUSB_DoTransfer with endpoint = 0x1, count = 16, and isBulk = 1.
    libusb: warning [darwin_transfer_status] transfer error: device not responding (value = 0xe00002ed)
    libusb error: LIBUSB_ERROR_IO
    ehFeedback error : write failed
    Calling LJUSB_SetupTransfer with count = 26 and operation = 0.
    Calling LJUSB_DoTransfer with endpoint = 0x1, count = 26, and isBulk = 1.
    libusb: error [submit_bulk_transfer] bulk transfer failed (dir = Out): pipe is stalled (code = 0xe000404f)
    libusb error: LIBUSB_ERROR_PIPE
    Error : getCalibrationInfo write failed
    Calling LJUSB_SetupTransfer with count = 10 and operation = 0.
    Calling LJUSB_DoTransfer with endpoint = 0x1, count = 10, and isBulk = 1.
    libusb: error [submit_bulk_transfer] bulk transfer failed (dir = Out): pipe is stalled (code = 0xe000404f)
    libusb error: LIBUSB_ERROR_PIPE
    ehFeedback error : write failed
    Calling LJUSB_SetupTransfer with count = 10 and operation = 0.
    Calling LJUSB_DoTransfer with endpoint = 0x1, count = 10, and isBulk = 1.
    libusb: error [submit_bulk_transfer] bulk transfer failed (dir = Out): pipe is stalled (code = 0xe000404f)
    libusb error: LIBUSB_ERROR_PIPE
    ehFeedback error : write failed
    

    It looks like the initial write failure produces a message from libusb saying "device not responding", whereas subsequent attempts generate "pipe is stalled". This aligns with my previous description of this failure mode: Communication with the device fails for some reason and doesn't recover.

    "Device not responding" certainly makes it sound like USB I/O with the device has broken. I don't see any reason to think the plugin has done something wrong: Everything appears to be running smoothly, then the connection up and dies. On the other hand, I can think of many possibilities elsewhere, such as

    • Hardware or firmware issue on the U6 itself
    • Faulty USB cable
    • Faulty USB hub (if you're using one)
    • Bug in libusb
    • Bug in some OS-level USB driver

    Unfortunately, I don't see any evidence to suggest that any of them is the cause, and I still can't reproduce the issue myself.

    Are you seeing this issue with one more than LabJack? If so, that makes a hardware fault seem much less likely (though not impossible). Are you seeing it on multiple Mac's? If it only happens on one machine, then it's possible there's a fault in its hardware (or the macOS version it's running).

    Chris

  18. 48 Posted by llglick on 08 Feb, 2022 10:57 PM

    llglick's Avatar

    Hi Chris,
    We are just seeing this issue on one computer because we only have one
    version of this microscope.
    Labjack also suggested it might be a hardware issue. I tried swapping out
    the USB cable- that didn't help. I just tried moving to a new port on the
    USB hub- but will now also try swapping out the hub. Any suggestions for
    how to test for bugs in the libusb or driver?
    One new observation- today it crashed repeatedly after very few trials
    (this is unusual- it usually happens more rarely). After one crash we tried
    just stopping and starting Mworks without closing out the session and it
    just proceeded as if nothing had happened- maybe that re-initiates the
    connection? Also, we then did restart Mworks, and it didn't crash again
    (today) after that.
    Lindsey.

  19. Support Staff 49 Posted by Christopher Sta... on 10 Feb, 2022 05:46 PM

    Christopher Stawarz's Avatar

    Hi Lindsey,

    Any suggestions for how to test for bugs in the libusb or driver?

    Probably the best thing would be to use the U6 with a Windows machine, since that would involve completely different USB driver code and different OS-specific code in libusb. But obviously you can't run your MWorks experiment under Windows, so that's not really an option.

    Apart from that, someone with deep knowledge of libusb, the macOS USB driver stack, and/or the low-level USB I/O performed by the U6 might be able to assess whether things are working correctly on the software side. But I don't think I'd be able to figure it out.

    After one crash we tried just stopping and starting Mworks without closing out the session and it just proceeded as if nothing had happened- maybe that re-initiates the connection?

    Interesting. Starting the device (i.e. calling startDeviceIO) will close and re-open the USB connection, but only if the first attempt to configure the device fails. If this happens, the MWServer console should show a warning saying:

    LJU6 found dead, restarting.  (bug if not on MWServer restart)
    

    Is that what you see?

    Chris

  20. 50 Posted by mhisted on 10 Feb, 2022 05:57 PM

    mhisted's Avatar

    >
    > LJU6 found dead, restarting. (bug if not on MWServer restart)

    Back in the day, when I wrote the code that emits this, I did this restart because the Labjack could occasionally become unreachable sometime between MWorks stop and MWorks restart.
    At some point, this stopped being a problem so we rarely see this message now.

    At the time, if I remember correctly, I patched libusb to do some kind of deep USB restart, that was possible with stock libusb on Linux but not on Mac. The Mac version of the libusb restart/reset call didn't do much. Again, this is a long time ago so I don't have a clear memory. Depending on what's going wrong, it might be possible to do "stronger" resets of the Mac USB stack from MWorks.

  21. 51 Posted by llglick on 15 Feb, 2022 12:48 AM

    llglick's Avatar

    Hi Chris,
    I think the problem might have been a faulty USB hub. We swapped it out on
    Thursday, and haven't had a crash since. Will let you know if it comes up
    again, but hopefully the issue is closed (and we've identified what that
    failure mode looks like).
    Thanks,
    Lindsey.

  22. Support Staff 52 Posted by Christopher Sta... on 15 Feb, 2022 02:32 PM

    Christopher Stawarz's Avatar

    OK, thanks for the update.

    Chris

  23. Christopher Stawarz closed this discussion on 08 Mar, 2022 02:57 PM.

Comments are currently closed for this discussion. You can start a new one.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac

Recent Discussions

17 May, 2022 02:12 PM
16 May, 2022 03:12 PM
04 May, 2022 06:02 PM
03 May, 2022 01:30 PM
02 May, 2022 10:47 PM