-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prestonr feature add wkc driver saves #110
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work, this feature will definitely help diagnose and debug issues. checkout my review comments and feel free to reach out for more discussion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments
@@ -226,6 +228,62 @@ bool jsd_init(jsd_t* self, const char* ifname, uint8_t enable_autorecovery) { | |||
return true; | |||
} | |||
|
|||
bool jsd_all_slaves_operational(jsd_t* self) { | |||
int slave; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reduce the scope of slave
here:
for(int slave=1, slave <= *self.->ecx_context.slavecount; slave++) {
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also weird that they're using a signed integer to refer to a count, but this comes from SOEM
src/jsd.c
Outdated
else { | ||
MSG("Some slaves were not operational."); | ||
if (self->ecx_context.ecaterror) { | ||
MSG("We experienced an ECAT error. When this occurs, error information aught to be saved. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aught is not the correct word here. Who is "we"? Please update error message as:
"An ECAT error occurred; the error list is displayed below:"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified
src/jsd.c
Outdated
MSG("Some slaves were not operational."); | ||
if (self->ecx_context.ecaterror) { | ||
MSG("We experienced an ECAT error. When this occurs, error information aught to be saved. " | ||
"Errors in error list displayed below:\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add a line break here? doesnt this add an extra line of whitespace?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, the MSG
macro already adds a newline, so no need to add one manually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified this
src/jsd.c
Outdated
if (self->ecx_context.ecaterror) { | ||
MSG("We experienced an ECAT error. When this occurs, error information aught to be saved. " | ||
"Errors in error list displayed below:\n"); | ||
while(self->ecx_context.ecaterror) MSG("%s\n", ecx_elist2string(&self->ecx_context)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a timeout or counter to avoid the (unlikely) possibility of polling indefinitely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Welp, I was running this with srl-gcat and I ended up with an infinite loop, I had to ctrl+\
to exit.
And FWIW, my infinite loop was just spamming blank lines, (i.e ecx_elist2string(&self->ecx_context) == ""
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh unpacking this some more self->ecx_context.ecaterror
is a bool, so unless we are going to clear that error flag, this is meant to "block" the application, and not really a way to unfurl via ecx_elist2string
all the possible errors in the error list.
If there's only one error, we would just print it over and over. I am not sure if this is the intent/outcome we want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is a bug that Will and I found with SOEM. Even though the elist is empty, ecaterror still states there is an error. The ecaterror flag aught to be cleared once the list is emptied.
So @kwehage fear of an infinite loop is valid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ecaterror flag aught to be cleared once the list is emptied.
It's more low level, but do you want to consider this as the conditional?
https://github.com/nasa-jpl/jsd/pull/112/files#diff-ec8bc2dbf8b89c6e78c73f1adf5732f1dff61d11f7cf4f22dd01241cb7cb39e3R285
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kwehage I just added a counter that once a certain max count is reached, it terminates the while loop
src/jsd.c
Outdated
MSG("We experienced an ECAT error. When this occurs, error information aught to be saved. " | ||
"Errors in error list displayed below:\n"); | ||
while(self->ecx_context.ecaterror) MSG("%s\n", ecx_elist2string(&self->ecx_context)); | ||
MSG("Went through all errors in the elist stack.\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you call it an "error list" above, call it an "error list" here also. Is it a stack or a list? Better to change to something like:
"Finished reporting errors in error list" or just remove this line altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a list of errors implemented as a stack. I'll change the message, however
if (self->ecx_context.slavelist[slave].state != EC_STATE_OPERATIONAL) { | ||
all_slaves_operational = false; | ||
if (self->ecx_context.slavelist[slave].state == | ||
(EC_STATE_SAFE_OP + EC_STATE_ERROR)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered about this weird syntax, but I see it comes from SOEM, so all is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah Dennis commented on this too. If they're both one-hot pieces of data, an addition does the same as a bitwise OR
Added a function that can be called by fastcat to confirm if jsd devices are operational at time of bad working counter. This will allow one to differentiate between situations where the CPU is overloaded and if there is simply poor physical connections to a device.
I have attached three pictures:
The first being where the entire JSD device is turned off (corresponds to poor power to the jsd bus).
The second where a single device at the end of the bus was unplugged (corresponds to a single bad device connection).
The third being where the timeout for jsd_read was set to a very small number to force the underlying SOEM function to time out (corresponds to no faulty JSD connections).
TODO:
Need to update the jsd tests to avoid test failures.