Kinect Data Stream Formats

The depth and video streams have similar formats. One primary difference is that the depth stream is monochromatic (it contains only an intensity value) while the video stream is 32-bit RGBA. A second main difference is in the resolution. The video stream has twice the resolution (4 times as many pixels) as the the depth stream.

For video or depth, the event handler’s argument list provides access to the image data


void nui_DepthFrameReady(object sender, ImageFrameReadyEventArgs e)
 {
 PlanarImage Image = e.ImageFrame.Image;

// Now manipulate the data

}

The video stream is the simplest data stream. The 640×480 PlanarImage can be viewed immediately in an Image box

PlanarImage Image = e.ImageFrame.Image;
image.Source = BitmapSource.Create(
                Image.Width, Image.Height, 96, 96, PixelFormats.Bgr32, null, Image.Bits, Image.Width * Image.BytesPerPixel);

or the Image data can be manipulated before display. The Coding4Fun toolkit has a method  that streamlines this to a single line of code: e.ImageFrame.ToBitMapSource().

As mentioned in an earlier post, the developer has the option of using only the depth stream, or the depth stream along with skeletal tracking information. The two resulting 320×240 streams have slightly different formats. With the UseDepth option (without a player index), the code above or the Coding4Fun simplification works similarly except for the difference in RGBA vs monochrome intensity streams.

With the DepthAndPlayerIndex option, the depth data stream is encoded with additional information containing an index which serves as a tracking number for skeletons. A tracking value of 0 attached to a pixel means it does not belong to a tracked individual. A tracking value of 1 is for skeleton number 1, and so on. The Kinect can actively track up to 2 individuals at a time or it can passively track up to 4. The encoding for these options are that the 3 least significant bits contain the tracking number while the next byte contains the depth intensity information.

The intensity value is not a greyscale byte. It is actually the distance in mm from the Kinect sensor to the object detected in that pixel. To convert it to greyscale, a mapping function can be used. Putting all this together yields


// treat the 16-bit image as an array of bytes, every 2 bytes contains

// sensor + skeleton tracking information

PlanarImage Image = e.ImageFrame.Image;

depthFrame16 = Image.Bits;

// start with pixel x, y in the image

int x = 160;

int y = 120; // right in the middle of the image, let's say

int index = (y*320 + x)*2;
int trackingNum = depthFrame16[index] & 0x07; // 3 least sig bits
int mmDistance = (depthFrame16[index + 1] << 5) | (depthFrame16[index] >> 3);
// convert to greyscale
byte intensity = (byte) (255 - (255 * mmDistance / 0x0fff));

Getting Data from the Kinect

The Kinect contains a VGA color camera, a depth (i.e. 3D sensor) camera, and a microphone array. The depth camera does most of the heavy lifting since it can be used to track objects in a 3D space. The microphone array allows for directional sound sensing and can be used in conjunction with the depth camera to track sounds in a 3D space.

The basic functionality of the Microsoft SDK is to

  1. Give access to the video camera streams
  2. Give access to the depth camera stream
  3. Give access to skeleton capture data
  4. Give access to the audio stream

A series of helpful tutorials is available at Channel 9 covering skeleton tracking, audio recording, and speech recognition. An introductory tutorial covers adding references to the Kinect libraries to your Visual Studio project and how to start the Kinect runtime. In addition, more complete code walkthroughs for the SDK examples are available.

Each of the four basic functions above are made available by the Microsoft Kinect SDK through events allowing access to the data through the event handlers. The main events available are called VideoFrameReady, DepthFrameReady, and SkeletonFrameReady. The audio stream is not event-driven, per se. Since it is essentially a continuous data stream, the streaming audio data is handled by its own thread. For speech recognition, a speech engine may use the stream and generate events each time speech is recognized or processed.

To gain access to these events after initialization of the Kinect, the developer is free to “mix-and-match” among various RunTimeOptions

nui = new Runtime();

            try
            {
                nui.Initialize(RuntimeOptions.UseDepthAndPlayerIndex | RuntimeOptions.UseSkeletalTracking | RuntimeOptions.UseColor);
            }
            catch (InvalidOperationException)
            {
                System.Windows.MessageBox.Show("Initialization failed.");
                return;
            }

            try
            {
                nui.VideoStream.Open(ImageStreamType.Video, 2, ImageResolution.Resolution640x480, ImageType.Color);
                nui.DepthStream.Open(ImageStreamType.Depth, 2, ImageResolution.Resolution320x240, ImageType.DepthAndPlayerIndex);
            }
            catch (InvalidOperationException)
            {
                System.Windows.MessageBox.Show("Failed to open stream. Please make sure to specify a supported image type and resolution.");
                return;
            }

Not shown in the above example is a final RunTimeOption UseDepth.

A simple way to develop apps using the Kinect is to add custom code to these event handlers as needed. A more sophisticated development approach is to create wrappers for event handlers which allow code re-use and more extensive customization.

The standard approach to creating event handlers would be

nui.DepthFrameHandler += new EventHandler<ImageFrameReadyEventArgs>(nui_DepthFrameReady);
nui.SkeletonFrameHandler += new EventHandler<SkeletonFrameReadyEventArgs>(nui_SkeletonFrameReady);
nui.VideoFrameHandler += new EventHandler<ImageFrameReadyEventArgs>(nui_ColorFrameReady);

while for the separate audio thread, one defines a function to start the audio stream (shown here for a speech recognition application)

 protected void startSpeech()
        {
            source = new KinectAudioSource();
            source.FeatureMode = true;
            source.AutomaticGainControl = false; //turn this off for speech recognition
            source.SystemMode = SystemMode.OptibeamArrayOnly;
            source.MicArrayMode = MicArrayMode.MicArrayAdaptiveBeam;

            Stream s = source.Start();

// now the Stream object can be used with a speech recognition engine

        }

and then a thread is launched with this function

using Microsoft.Research.Kinect.Audio;
using System.Threading;

namespace KinectSpeech
{
    class kinectSpeech
    {
        private const string RecognizerId = "SR_MS_en-US_Kinect_10.0";

        public kinectSpeech(MainWindow caller)
        {
            //any local setup stuff we want to do
            var speechThread = new Thread(new ThreadStart(startSpeech));
            speechThread.Start()
        }
}
}

Once the handlers and/or thread is set up, the sky’s the limit!

Getting Started with Kinect Development In Windows

To get started developing for the Kinect in MS Windows, a few downloads are necessary.

If you don’t have Visual Studio already, you can download the free version, Visual Studio Express for Windows 7. You will also need a current version of the .net framework. The actual Kinect SDK is available from Microsoft Research as well. It requires Windows 7, the above downloads or the commercial version of Visual Studio, and the following prerequisites to fully use its functionality:

The Microsoft SDK is closed-source, licensed only for academic and non-commercial use.

After installing this software, you should be able to plug the Kinect in to a USB port (the requirements state it needs its own dedicated USB hub to function; your mileage may vary if you have multiple USB devices active along with it) and run either the Skeletal Viewer or the Shape game. The SDK also includes the source code for these games. Visual Basic examples are also available for download. A few useful utilities are available in the Coding4Fun Kinect library which can be added to a Visual Studio project using NuGet or as a direct download.

 

If all goes well, you’ll run the skeletal viewer and see something like:

Sample skeleton capture

I am now a yellow skeleton.

To test the audio drivers, run the shape game and see if any error messages appear on the screen.

Tagged ,
Follow

Get every new post delivered to your Inbox.