The depth and video streams have similar formats. One primary difference is that the depth stream is monochromatic (it contains only an intensity value) while the video stream is 32-bit RGBA. A second main difference is in the resolution. The video stream has twice the resolution (4 times as many pixels) as the the depth stream.
For video or depth, the event handler’s argument list provides access to the image data
void nui_DepthFrameReady(object sender, ImageFrameReadyEventArgs e)
{
PlanarImage Image = e.ImageFrame.Image;
// Now manipulate the data
}
The video stream is the simplest data stream. The 640×480 PlanarImage can be viewed immediately in an Image box
PlanarImage Image = e.ImageFrame.Image;
image.Source = BitmapSource.Create(
Image.Width, Image.Height, 96, 96, PixelFormats.Bgr32, null, Image.Bits, Image.Width * Image.BytesPerPixel);
or the Image data can be manipulated before display. The Coding4Fun toolkit has a method that streamlines this to a single line of code: e.ImageFrame.ToBitMapSource().
As mentioned in an earlier post, the developer has the option of using only the depth stream, or the depth stream along with skeletal tracking information. The two resulting 320×240 streams have slightly different formats. With the UseDepth option (without a player index), the code above or the Coding4Fun simplification works similarly except for the difference in RGBA vs monochrome intensity streams.
With the DepthAndPlayerIndex option, the depth data stream is encoded with additional information containing an index which serves as a tracking number for skeletons. A tracking value of 0 attached to a pixel means it does not belong to a tracked individual. A tracking value of 1 is for skeleton number 1, and so on. The Kinect can actively track up to 2 individuals at a time or it can passively track up to 4. The encoding for these options are that the 3 least significant bits contain the tracking number while the next byte contains the depth intensity information.
The intensity value is not a greyscale byte. It is actually the distance in mm from the Kinect sensor to the object detected in that pixel. To convert it to greyscale, a mapping function can be used. Putting all this together yields
// treat the 16-bit image as an array of bytes, every 2 bytes contains // sensor + skeleton tracking information PlanarImage Image = e.ImageFrame.Image; depthFrame16 = Image.Bits; // start with pixel x, y in the image int x = 160; int y = 120; // right in the middle of the image, let's say int index = (y*320 + x)*2; int trackingNum = depthFrame16[index] & 0x07; // 3 least sig bits int mmDistance = (depthFrame16[index + 1] << 5) | (depthFrame16[index] >> 3); // convert to greyscale byte intensity = (byte) (255 - (255 * mmDistance / 0x0fff));
