SOLVED ofxKinectV2 on windows with libfreenect2


#1

I am trying to work with @bakercp 's fork of ofxKinectV2 on windows VS2017 (https://github.com/bakercp/ofxKinectV2). I have read through many other threads regarding this, mostly for theodore watsons ofxKinectV2. I could get the original repo functioning but want to take advantage of the later version of libfreenect and the point cloud (which I could never get working myself) . I have taken the following steps and think I am getting close.

I have downloaded the release lib of libfreenect2 from git (https://github.com/OpenKinect/libfreenect2/releases)
and added the lib to my project in
properties->linker->input->additional dependancies
and added the path to the lib in
properties->linker->general->additional library directories

I installed the nvidia gpu computing toolkit and added OpenCL.lib to
properties->linker->input->additional dependancies
and added the path to the lib in
properties->linker->general->additional library directories
and added the opencl (and cuda) include folder to
properties->C/C++->General->additional include directories

The example compiles but I get linker errors I don’t understand (only 2).

Error unresolved external symbol "public: __cdecl libfreenect2::OpenCLKdePacketPipeline::OpenCLKdePacketPipeline(int)" (??0OpenCLKdePacketPipeline@libfreenect2@@QEAA@H@Z) example C:\Users\Fred\Documents\openFrameworks\addons\ofxKinectV2\example C:\Users\Fred\Documents\openFrameworks\addons\ofxKinectV2\example\ofProtonect.obj 1

and

Error unresolved external symbol "public: __cdecl libfreenect2::OpenCLPacketPipeline::OpenCLPacketPipeline(int)" (??0OpenCLPacketPipeline@libfreenect2@@QEAA@H@Z) example C:\Users\Fred\Documents\openFrameworks\addons\ofxKinectV2\example C:\Users\Fred\Documents\openFrameworks\addons\ofxKinectV2\example\ofProtonect.obj 1

Ideally I would like to use CUDA instead of openCL for its performance advantage, but openCL would be fine if I can get it to compile.

I have a further question related to CUDA. I see in the config.h file a lot of defines that switch the image decoding and hardware acceleration methods. I don’t really get the structure or how I should change them. I thought I should just use a pre-processor definition in the project properties but this gives redefinition warnings and I am not sure if this is how it should work.

For example, I have added the cuda.lib and include paths and added a pre-processor definition in
properties->C/C++->Preprocessor->preprocessor definitions LIBFREENECT2_WITH_CUDA_SUPPORT

and in
properties->C/C++->Preprocessor->undefine preprocessor definitions LIBFREENECT2_WITH_OPENCL_SUPPORT;LIBFREENECT2_WITH_OPENGL_SUPPORT

When I do this it seems correct but then I get the following errors:

'CUDAKDEPacketPipeline': is not a member of 'libfreenect2'	
'CUDAPacketPipeline': is not a member of 'libfreenect2'

syntax error: identifier 'CUDAKDEPacketPipeline'	
syntax error: identifier 'CUDAPacketPipeline'

It would be great to get an understanding of how I can solve this.

Cheers

Fred


#2

OK, so I have a clue, it would be great to have some confirmation. I think the freenect.lib that is on the releases page of libfreenect is not compiled with openCL or Cuda support. I am guessing I need to compile these from scratch with these enabled. I am able to run on windows with CPU as the packet pipeline, needless to say this is very slow (3fps or so).


#3

Solution: First I had to recompile the libfreenect.lib, this was a bit of a pain but the instructions are pretty clear on the libfreenect github page. The only strange things I had to do were:

generate the libturbojpeg lib from the dll, I found good clear instructions here:

Then for some reason the build script for libusb did not work so I had to open the project in visual studio and build it manually, no big deal.

Then just following all the instructions I used cmake (and the GUI version as I was making too many mistakes) to generate project files for libfreenect2. It did not find any of the libturbojpeg stuff or the libusb stuff automatically but this can be fixed by just pointing cmake to the correct files.

The main settings are obviously adding CUDA and openCL support (I did not use the intel openCL implementation, instead I used the NVidia implementation (so this will only work if you have an Nvidia card - and obviously this is true for CUDA as well. But also turn onbuild shared libs and I also selected use static cuda runtime.

Next building the example with the new lib I got stuck a little, @bakercp there were a few errors in your ofProtonect class, your fork does not have issues enabled so I could not post them on the github page, but here they are:

line 30 and 31 of ofProtonect are missing some commas and should be:

        ,CUDA,
        CUDAKDE

also lines 74 to 79 of ofProtonect.cpp should be:

        case PacketPipelineType::CUDA:
			pipeline = new libfreenect2::CudaPacketPipeline(deviceId);
            break;
        case PacketPipelineType::CUDAKDE:
            pipeline = new libfreenect2::CudaKdePacketPipeline(deviceId);
            break;

instead of:


        case PacketPipelineType::CUDA:
            pipeline = new libfreenect2::CUDAPacketPipeline(deviceId);
            break;
        case PacketPipelineType::CUDAKDE:
            pipeline = new libfreenect2::CUDAKDEPacketPipeline(deviceId);
            break;

After these changes as I said int he initial posts I had to add the freenect2.lib to the project and copy the dll’s from the example generated by building libfreenect2 (with examples and the dlls are in the protonect example build folder) to the build folder of the ofxKinectV2 example.

Last, and I don’t know if this is needed, I copied the newly generated config.h and exports.h files to the include folder of the ofxKinectv2 addon.

I now have all flavors of packet processing functioning and the openCL and Cuda versions are immensely more efficient. I need to do some more testing to see the advantages of Cuda over Open CL but I was persisting with CUDA due to the benchmarks on this page:

From this it is clear that the best results are achieved on with VAAPI and CUDA (VAAPI is linux only) but I think I can suffice with windows and CUDA.