Monday, October 19, 2009

Nagios: check_http, using the --invert-regex option

Some times you want to check that something is running or working correctly and you work out tests for that. Other times you want to know when something is broken and throwing error messages. This is about the latter, a proper HTTP 200 code is great and all, but what if the page is just showing "Too Many Connections" instead of your home page? My old check_http command for this server used to look like, well, check_http. I didn't check anything about it specifically, just that it was returning a 200 code.

Today however I knew I needed something more in depth. Our database server lost its local network connection, but still was available over the public IP, which is what I test against. Once we re-directed the SQL requests to the public IP address of the server everything started working again, until we ran across "Too Many Connections". The database server kept all of the "local" connections open and thus we ate up the rest.

So, how to test for this scenario? After reading through the man pages of check_http I saw this little gem "--invert-regex Return CRITICAL if found, OK if not". This I knew was exactly what I was looking for! If it sees our error codes it will go Critical! Now to put this gem into practice. Here is where the man pages fall short. There is no explanation on HOW to use this, just that it exists. I tried the obvious to me "check_http -H hostname.com -w 3 -c 5 --invert-regex 'Some string'", but that didn't work. OK, lets try "check_http -H hostname.com -w 3 -c 5 --invert-regex='Some string'" nope that errored out with " option `--invert-regex' doesn't allow an argument".

Third times the charm right?
"check_http -H hostname.com -w 3 -c 5 -r 'Some string' --invert-regex '"
# HTTP OK HTTP/1.1 200 OK - 0.355 second response time |time=0.354966s;3.000000;5.000000;0.000000 size=12975B;;;0

Yes, as it turns out third time is the charm. So that got me thinking some more. How can I ensure that the page is rendering correctly, and if it isn't fail but in a specific way?

"check_http -H hostname.com -w 3 -c 5 -r 'Some string I want in my page' -r 'Some string I don't want to see' --invert-regex '"

You can add more than one -r to the check_http command and it will require all of them to be present for the test to pass, and if one of them fails then it will go critical! Perfect!

If you have any more insight into using the check_http command in Nagios I want to hear about it. We are always running into new failure scenarios that we didn't anticipate and I want to know about them before one of my users tells me about it.