Friday, October 16, 2009

Web service monitoring w/ Nagios and JSON

I'm using Nagios to act as a watch dog for my network and the various services that live on it. Nagios does the job pretty well. It lets me know when there's a problem, when things are back to normal and generally keeps on eye on things for me.

The checks that Nagios performs are done through a series of check commands. These commands are your typical Unix style program with the exceptions that they produce a single line of text that describes the state of the item being checked and the exit value let's Nagios know what's going on.

So for instance, to check the health of the web service on the localhost:

peter@sybil:~$ /usr/lib/nagios/plugins/check_http -H localhost
HTTP OK HTTP/1.1 200 OK - 361 bytes in 0.001 seconds |time=0.001021s;;;0.000000 size=361B;;;0
peter@sybil:~$ echo $?
2
peter@sybil:~$

The single line of text that is displayed follows a specific format. It starts with the prefix of what's being tested, HTTP. Next is the status, OK. This can be OK, WARNING, CRITICAL or UNKNOWN. Everything after the status is eye candy that provide details that are specific to the test being done. Nagios doesn't really care about it but it does provide important details when looking at problems that may be occurring.

Writing these check program in Python is pretty straight forward.

I recently had a situation where our ISP moved our web servers from one physical machine to another. This resulted in the credit cards processing for our online store to fail. The payment provider uses the IP address of the server as part of the authentication process when submitting credit cards for processing. Since the server changed, the IP address changed. Things went around in circles for a while until we figured out the problem and gave the new IP address to the payment
provider.

I thought is would be a good additional Nagios check for the store web site to check on the IP address of the physical server. Unfortunately, the ISP doesn't provide access to the IP address. But they do provide access to the hostname.

To get the hostname, I added a simple CGI program that determines the hostname and then packages it up as a JSON data structure.

#!/usr/bin/env python

"""
Bundle the hostname up as a JSON data structure.

Copyright (c) 2009 Peter Kropf. All rights reserved.
"""

import cgi
import popen2
import sys
sys.path.insert(1, '/home/crucible/tools/lib/python2.4/site-packages')
sys.path.insert(1, '/home/crucible/tools/lib/python2.4/site-packages/simplejson-2.0.9-py2.4-linux-x86_64.egg')

import simplejson as json

field = cgi.FieldStorage()
print "Content-Type: application/json\n\n"

r, w, e = popen2.popen3('hostname')
host = r.readline()
r.close()
w.close()
e.close()

fields = {'hostname': host.split('\n')[0]}

print json.dumps(fields)

There's a couple of things to note. Since the ISP provides a very restrictive environment, I have to add the location of the simplejson module before it can be imported. It's a bit annoying but it does work.

On the Nagios service side, I created a new check program called check_json. It takes the name of a field, the expected value and the URI from which to pull the JSON data.

#! /usr/bin/env python

"""
Nagios plugin to check a value returned from a uri in json format.

Copyright (c) 2009 Peter Kropf. All rights reserved.

Example:

Compare the "hostname" field in the json structure returned from
http://store.example.com/hostname.py against a known value.

./check_json hostname buenosaires http://store.example.com/hostname.py
"""


import urllib2
import simplejson
import sys
from optparse import OptionParser

prefix = 'JSON'

class nagios:
ok = (0, 'OK')
warning = (1, 'WARNING')
critical = (2, 'CRITICAL')
unknown = (3, 'UNKNOWN')


def exit(status, message):
print prefix + ' ' + status[1] + ' - ' + message
sys.exit(status[0])


parser = OptionParser(usage='usage: %prog field_name expected_value uri')
options, args = parser.parse_args()


if len(sys.argv) < 3:
exit(nagios.unknown, 'missing command line arguments')

field = args[0]
value = args[1]
uri = args[2]

try:
j = simplejson.load(urllib2.urlopen(uri))
except urllib2.HTTPError, ex:
exit(nagios.unknown, 'invalid uri')

if field not in j:
exit(nagios.unknown, 'field: ' + field + ' not present')

if j[field] != value:
exit(nagios.critical, j[field] + ' != ' + value)

exit(nagios.ok, j[field] + ' == ' + value)


Some checking is done to ensure that the JSON data can be retrieved, that the needed field is in the data and then that the field's value matches what's expected.

These examples show the basic testing that's done and the return values:

peter@sybil:~$ /usr/lib/nagios/plugins/check_json hostname buenosaires http://store.thecrucible.org/hostname.py
JSON OK - buenosaires == buenosaires
peter@sybil:~$ echo $?
0
peter@sybil:~$ /usr/lib/nagios/plugins/check_json hostname buenosaires http://store.thecrucible.org/hostname.p
JSON UNKNOWN - invalid uri
peter@sybil:~$ echo $?
3
peter@sybil:~$ /usr/lib/nagios/plugins/check_json hostname buenosairs http://store.thecrucible.org/hostname.py
JSON CRITICAL - buenosaires != buenosairs
peter@sybil:~$ echo $?
2
peter@sybil:~$ /usr/lib/nagios/plugins/check_json ostname buenosaires http://store.thecrucible.org/hostname.py
JSON UNKNOWN - field: ostname not present
peter@sybil:~$ echo $?
3
peter@sybil:~$

Once the Nagios server is configured with the new command, the hostname on the server can be monitored and hopefully ease any problems that may occur then next time things change at the ISP.

More details on Nagios can be found at http://nagios.org and on developing check program at http://nagiosplug.sourceforge.net/developer-guidelines.html.

Monday, June 01, 2009

Running External Django Scripts

Django is pretty good at creating a database driven website. The documentation is clear and the tutorials show how to use the framework to create web based applications. But one part that I wish was a bit more straight forward is running scripts outside the web server. The issue is that Django code expects to have a certain environment configured and setup for the framework. With this in place, you can preform tasks like polling an IMAP server for incoming email messages or monitoring a directory for new files or whatever else needs to be done. There are several posts online to help you get the environment setup here, here and here. But some of them seem not to work correctly because of the changes to Django for the 1.0 release or other reasons.

I have a fairly straight forward example of how to setup the Django environment and allow the rest of your code to access the Django framework for your web application. Its remarkably simple and straight forward.

Suppose that I've created a Django project in my tmp directory called demo_scripts and within that project, I create an app called someapp.

peter@fog:~/tmp> django-admin-2.5.py startproject demo_scripts
peter@fog:~/tmp> cd demo_scripts/
peter@fog:~/tmp/demo_scripts> django-admin-2.5.py startapp someapp
peter@fog:~/tmp/demo_scripts>


I create a model in someapp that looks like:

from django.db import models

class Foo(models.Model):
name = models.CharField(max_length=21,
unique=True,
help_text="Name of the foo.")

def __unicode__(self):
return self.name

class Meta:
ordering = ('name',)


Next step is to sync the database:

peter@fog:~/tmp/demo_scripts> ./manage.py syncdb
Creating table auth_permission
Creating table auth_group
Creating table auth_user
Creating table auth_message
Creating table django_content_type
Creating table django_session
Creating table django_site
Creating table someapp_foo

You just installed Django's auth system, which means you don't have any superusers defined.
Would you like to create one now? (yes/no): yes
Username (Leave blank to use 'peter'):
E-mail address: pkropf@gmail.com
Password:
Password (again):
Superuser created successfully.
Installing index for auth.Permission model
Installing index for auth.Message model
peter@fog:~/tmp/demo_scripts>


And add some initial data to the database:

peter@fog:~/tmp/demo_scripts> ./manage.py shell
Python 2.5.4 (r254:67916, May 1 2009, 17:14:50)
[GCC 4.0.1 (Apple Inc. build 5490)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from someapp.models import Foo
>>> Foo(name='A Foo').save()
>>> Foo(name='Another Foo').save()
>>>
peter@fog:~/tmp/demo_scripts>


Now we can write a standalone script to do something with the data model. For simplicity's sake, I'll just print out all the Foo objects. The script is going to live in a new directory called scripts. Here's the source:

#! /usr/bin/env python
#coding:utf-8

import sys
import os
import datetime

sys.path.insert(0, os.path.expanduser('~/tmp/demo_scripts'))
os.environ['DJANGO_SETTINGS_MODULE'] = 'settings'

from someapp.models import *
print Foo.objects.all()


When I run the script, it prints the array of the two Foo objects that I previously created:

peter@fog:~/tmp/demo_scripts> ./scripts/show_foo.py 
[<Foo: A Foo>, <Foo: Another Foo>]
peter@fog:~/tmp/demo_scripts>


Lines 8 and 9 are the critical lines in the script code. The first adds the project directory to the Python system path so that the settings module can be found. The second tells the Django code which module to import to determine the project settings.

Wednesday, May 27, 2009

Django Google Apps Authentication

Django has an excellent user management and authentication system built into the framework. With it you can easily create users that can be authenticated against the website. But there are times when you just need to authenticate against a different system. In the case of an app I recently developed, I originally wanted to authenticate against an OS X Server. The OpenDirectory service on OS X Server is an LDAP server, under the hood you'll find slapd from OpenLDAP running. So should be pretty straight forward to create an authentication module that uses Python's LDAP module. And this article from the Carthage WebDev site shows you how to do it.

After I got the ldap_auth.py module working on my site, I realized the site would be better served if the authentication happened against Google Apps. Since Google Apps is currently being used by the organization for email, calendaring and sharing documents, everyone already has an account there. And with the ldap_auth.py module from Carthage Webdev, I thought it would be pretty simple to provide a google_auth.py module.

To get started, I had to install gdata. The installation instructions found on the Google Apps APIs page were pretty easy to follow. Specifically, I had to install the Provisioning API.

On a side note, I'm using Python 2.5 as installed via MacPorts. Before I could use the gdata APIs, I had to install py25-socket-ssl.

The APIs are pretty well documented via the examples from the Python Developer's Guide. Here's how I'm authenticating a Django project with users on Google Apps.

To start, there are three configuration variables that I added to the Django project's settings.py module:

# Google Apps Settings
GAPPS_DOMAIN = 'your_domain.com'
GAPPS_USERNAME = 'name_of_an_admin_user'
GAPPS_PASSWORD = 'admin_users_password'

These will allow the module to authenticate against Google Apps and ask for specific details about the user.

Here's the code for google_auth.py:


import logging
from django.contrib.auth.models import User
from django.conf import settings
from gdata.apps.service import AppsService, AppsForYourDomainException
from gdata.docs.service import DocsService
from gdata.service import BadAuthentication


logging.debug('GoogleAppsBackend')


class GoogleAppsBackend:
""" Authenticate against Google Apps """

def authenticate(self, username=None, password=None):
logging.debug('GoogleAppsBackend.authenticate: %s - %s' % (username, '*' * len(password)))
admin_email = '%s@%s' % (settings.GAPPS_USERNAME, settings.GAPPS_DOMAIN)
email = '%s@%s' % (username, settings.GAPPS_DOMAIN)

try:
# Check user's password
logging.debug('GoogleAppsBackend.authenticate: gdocs')
gdocs = DocsService()
gdocs.email = email
gdocs.password = password
gdocs.ProgrammaticLogin()
# Get the user object

logging.debug('GoogleAppsBackend.authenticate: gapps')
gapps = AppsService(email=admin_email, password=settings.GAPPS_PASSWORD, domain=settings.GAPPS_DOMAIN)
gapps.ProgrammaticLogin()
guser = gapps.RetrieveUser(username)

logging.debug('GoogleAppsBackend.authenticate: user - %s' % username)
user, created = User.objects.get_or_create(username=username)

if created:
logging.debug('GoogleAppsBackend.authenticate: created')
user.email = email
user.last_name = guser.name.family_name
user.first_name = guser.name.given_name
user.is_active = not guser.login.suspended == 'true'
user.is_superuser = guser.login.admin == 'true'
user.is_staff = True
user.save()

except BadAuthentication:
logging.debug('GoogleAppsBackend.authenticate: BadAuthentication')
return None

except AppsForYourDomainException:
logging.debug('GoogleAppsBackend.authenticate: AppsForYourDomainException')
return None

return user


def get_user(self, user_id):

user = None
try:
logging.debug('GoogleAppsBackend.get_user')
user = User.objects.get(pk=user_id)

except User.DoesNotExist:
logging.debug('GoogleAppsBackend.get_user - DoesNotExist')
return None

return user

It was pretty easy to write and debug this code using the ldap_auth.py module as a working example.

One downside to this code is that any newly created users in the Django auth database don't have any rights. So if the Django project is expecting to be able to dynamically change the contents based on the rights that the user has, the account will have to manually modified via the Django admin interface. Not too bad, but annoying.

Sunday, January 18, 2009

SSH and OS X

This has been driving me nuts for the past several months but I hadn't made the time to figure out the problem. Basically, the only account that could be used to ssh into our OS X server was the admin account. The admin account lives in the traditional Unix /etc/passwd database. Any account that was created via Workgroup Manager, like mine, (that is one that lives in Open Directory, OS X's LDAP authentication database) wouldn't work. As I said, this has been driving me nuts and I finally spent some time digging through the man pages, configuration files and log files to figure out what was going on.

It seems that a previous sysadmin had added the AllowUsers keywords to the sshd configuration file in /etc/sshd_config. On the AllowUsers line were listed the users who were able to connect via ssh. And wouldn't you know, my account wasn't listed.

I got to this point by reading through the /var/log/secure.log file to see what OS X was recording as the problem with connecting. There was one line in particular that stood out:

Jan 18 15:13:09 xyzzy sshd[4656]: User peter from 192.168.1.154 not allowed because not listed in AllowUsers
AllowUsers? That's strange. I don't remember anywhere in OS X that would use a convention like this to control the environment. But a quick search on Google shows that this was a keyword used in the sshd configuration file. Adding my account name to the list and I was able to ssh in without any problem. Oh yeah, life is good!

One cool side note, sshd didn't have to be restarted. It's smart enough to know the configuration file has changed. Makes it very easy to test configuration changes.

But modifying the /etc/sshd_config file every time I need to allow ssh access to someone isn't an easy way to manage account priveleges on OS X. Looking a bit more at the sshd_config man page shows that there's also a AllowGroup option. So I removed the AllowUsers line and replaced it with:

AllowGroup ssh
Then using the standard Workgroup Manager, I added a new group called ssh and put the various accounts that need ssh access into the group. Now any accounts that needs ssh access can easily be added (or removed) from the ssh group and sshd will automatically give them access.

Yea!