Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prestonr feature add wkc driver saves #110

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

preston-rogers
Copy link
Contributor

@preston-rogers preston-rogers commented Jul 31, 2024

Added a function that can be called by fastcat to confirm if jsd devices are operational at time of bad working counter. This will allow one to differentiate between situations where the CPU is overloaded and if there is simply poor physical connections to a device.

I have attached three pictures:
The first being where the entire JSD device is turned off (corresponds to poor power to the jsd bus).
The second where a single device at the end of the bus was unplugged (corresponds to a single bad device connection).
The third being where the timeout for jsd_read was set to a very small number to force the underlying SOEM function to time out (corresponds to no faulty JSD connections).

image
image
image

TODO:
Need to update the jsd tests to avoid test failures.

Copy link
Collaborator

@alex-brinkman alex-brinkman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work, this feature will definitely help diagnose and debug issues. checkout my review comments and feel free to reach out for more discussion

CMakeLists.txt Outdated Show resolved Hide resolved
src/jsd.c Outdated Show resolved Hide resolved
Copy link
Contributor

@kwehage kwehage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

@@ -226,6 +228,62 @@ bool jsd_init(jsd_t* self, const char* ifname, uint8_t enable_autorecovery) {
return true;
}

bool jsd_all_slaves_operational(jsd_t* self) {
int slave;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduce the scope of slave here:

for(int slave=1, slave <= *self.->ecx_context.slavecount; slave++) {
   ...
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also weird that they're using a signed integer to refer to a count, but this comes from SOEM

src/jsd.c Outdated
else {
MSG("Some slaves were not operational.");
if (self->ecx_context.ecaterror) {
MSG("We experienced an ECAT error. When this occurs, error information aught to be saved. "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aught is not the correct word here. Who is "we"? Please update error message as:
"An ECAT error occurred; the error list is displayed below:"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified

src/jsd.c Outdated
MSG("Some slaves were not operational.");
if (self->ecx_context.ecaterror) {
MSG("We experienced an ECAT error. When this occurs, error information aught to be saved. "
"Errors in error list displayed below:\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add a line break here? doesnt this add an extra line of whitespace?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, the MSG macro already adds a newline, so no need to add one manually

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified this

src/jsd.c Outdated
if (self->ecx_context.ecaterror) {
MSG("We experienced an ECAT error. When this occurs, error information aught to be saved. "
"Errors in error list displayed below:\n");
while(self->ecx_context.ecaterror) MSG("%s\n", ecx_elist2string(&self->ecx_context));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a timeout or counter to avoid the (unlikely) possibility of polling indefinitely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welp, I was running this with srl-gcat and I ended up with an infinite loop, I had to ctrl+\ to exit.

And FWIW, my infinite loop was just spamming blank lines, (i.e ecx_elist2string(&self->ecx_context) == "")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh unpacking this some more self->ecx_context.ecaterror is a bool, so unless we are going to clear that error flag, this is meant to "block" the application, and not really a way to unfurl via ecx_elist2string all the possible errors in the error list.
If there's only one error, we would just print it over and over. I am not sure if this is the intent/outcome we want

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is a bug that Will and I found with SOEM. Even though the elist is empty, ecaterror still states there is an error. The ecaterror flag aught to be cleared once the list is emptied.
So @kwehage fear of an infinite loop is valid

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ecaterror flag aught to be cleared once the list is emptied.

It's more low level, but do you want to consider this as the conditional?
https://github.com/nasa-jpl/jsd/pull/112/files#diff-ec8bc2dbf8b89c6e78c73f1adf5732f1dff61d11f7cf4f22dd01241cb7cb39e3R285

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kwehage I just added a counter that once a certain max count is reached, it terminates the while loop

src/jsd.c Outdated
MSG("We experienced an ECAT error. When this occurs, error information aught to be saved. "
"Errors in error list displayed below:\n");
while(self->ecx_context.ecaterror) MSG("%s\n", ecx_elist2string(&self->ecx_context));
MSG("Went through all errors in the elist stack.\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you call it an "error list" above, call it an "error list" here also. Is it a stack or a list? Better to change to something like:
"Finished reporting errors in error list" or just remove this line altogether.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a list of errors implemented as a stack. I'll change the message, however

if (self->ecx_context.slavelist[slave].state != EC_STATE_OPERATIONAL) {
all_slaves_operational = false;
if (self->ecx_context.slavelist[slave].state ==
(EC_STATE_SAFE_OP + EC_STATE_ERROR)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered about this weird syntax, but I see it comes from SOEM, so all is good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah Dennis commented on this too. If they're both one-hot pieces of data, an addition does the same as a bitwise OR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants