Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

last_run_summary.yaml when agent can't retrieve catalog from remote server #32

Open
edrude opened this issue Feb 16, 2017 · 1 comment

Comments

@edrude
Copy link
Contributor

edrude commented Feb 16, 2017

Hopefully I'm not jumping the gun here. I will probably do some additional digging, but wanted to report what I'm seeing.

I'm using puppet agent 4.9.2

The plugin is able to parse last_run_summary.yaml if my puppet run is a success, but consider the case where a puppet run results in the following.

"Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Evaluation Error: Error while evaluating a Resource Statement, Could not find declared class...."

In that case my last_run_summary.yaml looks something like this.

version:
config:
puppet: 4.9.2
time:
last_run: 1487266691

This means that around lines 312-322 of the plugin it fails to parse last_run_summary.yaml resulting in a "UNKNOWN: last_run_summary.yaml not found, not readable or incomplete".

It seems that if a catalog can't be retrieved from the puppet master the resulting last_run_summary.yaml isn't valid for the purposes of this nagios plugin. I'm not sure how unique this case is, but I think it would be preferable to identify this case and report something about the agent being unable to retrieve it's catalog on the last puppet run instead of UNKNOWN.

You should be able to replicate this case by removing a module from your puppet environment that contains classes used in your manifests.

@aswen
Copy link
Owner

aswen commented Feb 17, 2017

@erude1 Thanks for bringing this up. I agree with you that a could not retrieve catalog error is reason for a CRITICAL alert rather than UNKNOWN.
The reason I did it this way is I had no other means of finding out the error in a simple way. I've not dived into this yet.
I have to find time to come up with a solution to improve on this. In the meantime I'm open to suggestions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants