Sunday, January 23, 2011

Kinect, Python and Pushing Through

There's tangent project that I'm working on that involves robotics, arduinos, kinect and fire. It's a small robot called Exuro that has a pair of stainless steel eyes that are meant to track a person coming up to a donation box and when they make a donation, set off a small poofer. The idea is to track people using a kinect and have the eyes move as if they're watching the closest person to the donation box. Working with an arduino to control external systems is pretty straight forward for me, it's something that I've done before. But pulling sensor data from something like a kinect and interpreting the data is something I've never done. It's rather intimidating. Processing video data at something like 30 frames per second, not something I'm used to do. But it sounds like fun!

There's an open source driver to access the kinect called libfreenect that's available from openkinect.org. Included are wrappers for using the library from Python which most definitely my preferred programming language. That works.

Getting libfreenect to build on a Ubuntu 10.10 system was pretty straight forward. Just follow the instructions in the README.asciidoc file. Getting the Python wrappers to work took a bit more effort. cython is used to create the bindings between libfreenect and Python. Unfortunately, the version that's currently included with Ubuntu 10.10 isn't up to the task. Once I removed the Ubuntu and installed from the latest source, the Python bindings built and worked as just fine. I'm sure the fine folks maintaining Ubuntu will make a newer version available at some point, I'm just not willing to put this project on hold till they do ;-)

There's a few demo files that are included with the wrapper so you can start to play with the interface, library and the kinect data. Two of them, demo_cv_sync.py and demo_cv_thresh_sweep.py, make for demo. The first opens two windows and shows a live video feed of the rgb camera in one and the depth camera in the other. The other demo shows a video of the depth camera but sweeps through the data showing what's seen at different depths. These are really interesting demos to help wrap your head around what's available from the kinect.

I got to wondering about the depth data and if there wasn't a way to combine the two demos to be able to slide through the depth manually to see what's there. The result is demo_cv_threshold.py. It allows you to slide along at any depth to see what's there and then to contract or expand to see what's around that depth. Here's a sample video showing my hand pushing through a virtual wall:



The depth slider sets the focal point for what data to display and the threshold provides a +/- tolerance for how much data to display. A depth of 535 and a threshold of 0 would show just the data at 535 while a depth of 535 and a threshold of 6 would show the data from 529 thru 541.

It's an interesting application to play with the gain a basic understanding of the data being returned and possible ways to use it. I've submitted a pull request on github to the maintainers of libfreenect to see if they're willing to include it in the next release. Here's hoping that they will.

There's a lot more work I need to do for this project. The next steps will be to find the closest person in the data stream and calculate their location in the real world in reference to the location of the kinect. And I have almost no idea how to go about doing that. Time to read up on numpy and opencv...