Thursday, May 5, 2011

Summary: From OpenGL and NITE to OpenNI

When we first began this project, we knew nothing about the Kinect.  So, to get started, we followed the instructions on this website to configure our computers to take in data from the Kinect, and render it into a visual representation.  This visual representation relied on OpenGL, GLUT, and NITE.  It gave us pretty pictures (like you saw in our earlier posts) that would show the environment, as well as draw a skeleton on individuals.  This was great, and we thought it was exactly what we needed.  We were able to modify this code so that when the individual was recognized, the code would print the x,y, and z coordinates of the center of mass of each person that the program was tracking (as seen again below).



Again, this was perfect, and we thought our project was almost done before it even started.  Then, we tried to put everything on to the BeagleBoard.

Well, to make a long story short, the program that used OpenGL, GLUT, and NITE wouldn't work on the BeagleBoard.  The first problem was that when the program ran, OpenGL had to open a new window (in other words, you would run the program from the terminal, and then another window would open).  It was in this window that it would do all the image rendering, and create the visual representation of what the Kinect was seeing (in the above image, the terminal is on the left with the x, y, and z coordinates, and the new image rendering window is on the right).  Now, the BeagleBoard, when running Ubuntu, is basically a headless terminal.  This means that you can only open terminal windows on it.  You can't open any other type of window.  Thus, if you tried to run the code, it would crash the BeagleBoard.  So, we figured we would just need to change the code so that the extra window wouldn't open.  We used the divide and conquer technique here.  Tim worked on modifying the code so that OpenGL wouldn't open that new window, and Matt worked on configuring NITE to run on the BeagleBoard.

Eventually, after wasting many hours learning the inner-workings of GLUT and NITE respectively, we realized we weren't going to get this program to run on the BeagleBoard.  The OpenGL main loop will not run without opening another window.  You would think this isn't a problem, since OpenGL just handles the image rendering. However, we discovered that the OpenGL also handled some of the skeleton tracking, and thus, without the OpenGL component of the code, the program was useless.  Right around the same time, we realized NITE would not run on the BeagleBoard.  You see, NITE is configured to run on an x86 platform, and thus is not compatible with ARM, the processor on the BeagleBoard.  When it comes to the "trial and error" process, we certainly were trying, and we sure had a lot of error.

So, we scrapped NITE, and went back to our old friend OpenNI.  OpenNI had a C program that would return the z-position of the pixel at the exact center of the Kinect's field of vision (the resolution of the Kinect is 640x480).  So, we took this code and ran with it.  We took this z-data for the exact center, and translated it into robot commands.  Thus, a person would stand exactly in front of the robot in the dead center of its view.  The z-data for the person would then be processed.  If the person was greater than a certain distance away, the robot was told to move towards them.  If the person was less than a certain distance away, the robot would move away from them.  And, if the person was in the "sweet spot" then the robot would remain still.

This code worked well, but it was very basic.  We added to it extensively, writing code to process the z-data, and writing code to translate that data in to robot commands that were then transmitted via ethernet to the robot.  However, we thought that if we stopped there, we wouldn't really be earning our pay.  So, we decided to go a bit further.

We created code that would search for a person, no matter where they were in the field of vision, and would recognize them.  Once they were seen, the robot would turn towards them until they were directly in front of the robot, while simultaneously moving closer or farther from the person.

To do this, we created our own code based off of the "middle of the screen z data code."  We broke the Kinect's field of vision in to grids.  The code then cycles through these grids, and gets the z data for whatever is in each grid.  This z data corresponds to the distance between the robot and whatever is in the frame.  Our initial code is only for an obstacle free environment, so we work under the assumption that whatever is closest to the robot is the human target.  So, the code cycles through the grids, and stores the z value for each.  Whichever z value is the smallest is determined to be the target, and the robot turns to get that target in its center of vision (in the y plane).  As the robot turns, it also moves closer or farther from the target, until the robot is about four feet away (in the z plane).

And thus, you have it.  Future work for this code will include improving the robot's reaction time and adding on PD control.