Ray Tracer implementation


#21

Well, solution number one is not practicable as send accepts just one argument :frowning:


#22

setColor is super slow anyway, try using iterators instead, this:

    auto startAtTime = ofGetElapsedTimeMillis();
    for (int y = 0; y < height; ++y) {
        for (int x = 0; x < width; ++x) {
            glm::vec3 P;
            glm::vec3 w;
            // Find the ray through (x, y) and the center of projection
            camera.getPrimaryRay(float(x) + 0.5f, float(y) + 0.5f, width, height, P, w);
            renderPixels.setColor(x, y, L_i(ofxRTRay(P, w)));
        }
}

to:

    auto startAtTime = ofGetElapsedTimeMillis();
    for (auto & line: renderPixels.getLines()) {
        for (auto & pixel: line.getPixels) {
            glm::vec3 P;
            glm::vec3 w;
            // Find the ray through (x, y) and the center of projection
            camera.getPrimaryRay(float(x) + 0.5f, float(y) + 0.5f, width, height, P, w);
            auto color = L_i(ofxRTRay(P, w));
            pixel[0] = color.r;
            pixel[1] = color.g;
            pixel[2] = color.b;
        }
}

also for what you are doing something like tbb or openmp would be much more appropiated than trying to manually divide the task. with any of those libraries the above would turn into something like:

    auto startAtTime = ofGetElapsedTimeMillis();
    parallel_for (auto & line: renderPixels.getLines()) {
        for (auto & pixel: line.getPixels) {
            glm::vec3 P;
            glm::vec3 w;
            // Find the ray through (x, y) and the center of projection
            camera.getPrimaryRay(float(x) + 0.5f, float(y) + 0.5f, width, height, P, w);
            auto color = L_i(ofxRTRay(P, w));
            pixel[0] = color.r;
            pixel[1] = color.g;
            pixel[2] = color.b;
        }
}

which would automatically assign every cpu core to process a line of the pixels in parallel


#23

Thanks, yes I’m trying to use TBB but for now I can not even include it. Do you have some example about how to declare its dependency in the header file?

Also, this loop is not working:

    for (auto & line: renderPixels.getLines()) {
        for (auto & pixel: line.getPixel()) {

Member function ‘getPixel’ not viable: ‘this’ argument has type ‘const ofPixels_::Line’, but function is not marked const

It sounds like getPixel returns a const value that you can not modify?


#24

try:

    for (auto line: renderPixels.getLines()) {
        for (auto pixel: line.getPixel()) {

The pixels and line classes contain references to the original pointers inside so getting them by copy is fine


#25

This expects an argument, that I assume was the index of the pixel in the line

    auto startAtTime = ofGetElapsedTimeMillis();
    int x =0;
    int y = 0;
    for (auto line: renderPixels.getLines()) {
        for (auto pixel: line.getPixel(y)) {
            glm::vec3 P;
            glm::vec3 w;
            // Find the ray through (x, y) and the center of projection
            camera.getPrimaryRay(float(x) + 0.5f, float(y) + 0.5f, width, height, P, w);
            auto color = L_i(ofxRTRay(P, w));
            pixel[0] = color.r;
            pixel[1] = color.g;
            pixel[2] = color.b;
            y++;
        }
        x++;
    }

but also this is not working:
Invalid range expression of type 'ofPixels_<unsigned char>::Pixel'; no viable 'begin' function available


#26

sorry getPixels not getPixel


#27

Now the it is faster, but my rendered image is empty:

void ofxRayTracer::traceImage(const ofxRTPinholeCamera& camera, ofRectangle& rectangle, shared_ptr<ofImage>& image){
    const int width = int(rectangle.getWidth());
    const int height = int(rectangle.getHeight());

    ofPixels renderPixels;
    renderPixels.allocate(width, height, OF_IMAGE_COLOR_ALPHA);

    auto startAtTime = ofGetElapsedTimeMillis();
    int x =0;
    int y = 0;
    for (auto line: renderPixels.getLines()) {
        for (auto pixel: line.getPixels()) {
            glm::vec3 P;
            glm::vec3 w;
            // Find the ray through (x, y) and the center of projection
            camera.getPrimaryRay(float(x) + 0.5f, float(y) + 0.5f, width, height, P, w);
            auto color = L_i(ofxRTRay(P, w));
            pixel[0] = color.r;
            pixel[1] = color.g;
            pixel[2] = color.b;
            x++;
        }
        y++;
    }

    image->setFromPixels(renderPixels);
    displayTime(ofGetElapsedTimeMillis() - startAtTime);
}

I assume it is because renderPixels was not changed, because i was not accessing it by reference?

getLines is returning horizontal lines or vertical lines? I can not found it in the doc or here https://github.com/openframeworks/openFrameworks/blob/master/libs/openFrameworks/graphics/ofPixels.cpp


#28

mmh that should work, as i said, lines and pixels contain pointers to the original memory so it’s fine to use them by copy.

horizontal lines, that’s how it’s laid out in memory so it’s the fastest to access, it’s the same order you had before


#29

i think the problem is you have an alpha channel so you need to also do:

 pixel[3] = color.a;

and there’s no need to allocate an ofPixels, just use the image internal pixels and update it at the end:

    for (auto line: image->getPixels().getLines()) {
        for (auto pixel: line.getPixels()) {
...
    image->update();

that way you avoid an allocation and a copy


#30

I was not resetting x to 0.

    for (auto line: renderPixels.getLines()) {
        for (auto pixel: line.getPixels()) {
            glm::vec3 P;
            glm::vec3 w;
            cout << x << endl;
            cout << y << endl;
            // Find the ray through (x, y) and the center of projection
            camera.getPrimaryRay(float(x) + 0.5f, float(y) + 0.5f, width, height, P, w);
            auto color = L_i(ofxRTRay(P, w));
            pixel[0] = color.r;
            pixel[1] = color.g;
            pixel[2] = color.b;
            x++;
        }
        y++;
        x = 0;
    }

But the render image is now “broken”.


#31

Yes, you’re right, alpha channel was shifting everything. Anyway, performance are not better, with a 640x400 resolution, the original loop took 17.16 seconds

        for (int y = 0; y < height; ++y) {
            for (int x = 0; x < width; ++x) {
                glm::vec3 P;
                glm::vec3 w;
                // Find the ray through (x, y) and the center of projection
                camera.getPrimaryRay(float(x) + 0.5f, float(y) + 0.5f, width, height, P, w);
                renderPixels.setColor(x, y, L_i(ofxRTRay(P, w)));
            }
        }

the getPixels one 17.836

        int x = 0;
        int y = 0;
        for (auto line: renderPixels.getLines()) {
            for (auto pixel: line.getPixels()) {
                glm::vec3 P;
                glm::vec3 w;

                // Find the ray through (x, y) and the center of projection
                camera.getPrimaryRay(float(x) + 0.5f, float(y) + 0.5f, width, height, P, w);
                auto color = L_i(ofxRTRay(P, w));
                pixel[0] = color.r;
                pixel[1] = color.g;
                pixel[2] = color.b;
                pixel[3] = color.a;
                x++;
            }
            y++;
            x = 0;
        }

Anyway, I will keep the sceond one while trying to use tbb.


#32

yeah with a process so time consuming the difference will be negligible. btw are you running in release mode?


#33

No, just debug mode.
I’ve added the image->getPixels().getLines() improvement you were suggesting, to remove another allocation. I’ve also removed the alpha pixel, and I’ve allocated the image just with OF_IMAGE_COLOR.
I think now I need just to have a working installation of tbb and see what happens.


#34

use release it’ll be much faster, you might need debug if something crashes or you need to add breakpoints… but to test performance you need to do so in release, somethings like ofPixels iterators only really make a difference in release


#35

I’ve finally get parallel_for.
I’ve included the header /usr/local/lib/tbb in the xcode project, now I can #include "parallel_for.h" and use parallel_for.

I’m trying to understand the syntax, what you’ve suggested:

tbb::parallel_for(auto line : image->getPixels().getLines()){

It is not working, tbb expects a const blocked_range<size_t>& that I do not know what it could be in this case, but I think I should first define a container with all the pixel and then have an iterator, or define an index, that goes through them.

https://software.intel.com/en-us/blogs/2009/08/03/parallel_for-is-easier-with-lambdas-intel-threading-building-blocks

Do you have any hints?


#36
parallel_for( blocked_range<size_t>(0,pixels.getHeight()),
	    [=](const blocked_range<size_t>& r) {
	      for(auto line: pixels.getLines(r.begin(), r.end() - r.begin()){
                  auto y = line.getLineNum();
                  auto x = 0;
                  for (auto pixel: line.getPixels()) {
                    glm::vec3 P;
                    glm::vec3 w;

                    // Find the ray through (x, y) and the center of projection
                    camera.getPrimaryRay(float(x) + 0.5f, float(y) + 0.5f, width, height, P, w);
                    auto color = L_i(ofxRTRay(P, w));
                    pixel[0] = color.r;
                    pixel[1] = color.g;
                    pixel[2] = color.b;
                    pixel[3] = color.a;
                    x++;
                }
            }
         }
);

something like that should work. the way it works is:

parallel_for( blocked_range<size_t>(0,pixels.getHeight()),

inits the parallel for from 0…height which will call the passed lamda function passing the range it has to process.

in the lambda blocked_range is a size_t range where begin is the first line you have to process and end the last one + 1

take into account that x and y are now used by different threads so you can’t declare them outside of the for loop as before


#37

Wow. I see the problem with x and y init.
This line is actually giving me a problem:

for(auto line: pixels.getLines(r.begin(), r.end() - r.begin())) {

The error is No matching member function for call to 'getLines'

I’ve read the getLines signature, and I’ve tried to pass an init value, and the size, like this, but it is not working:

for(auto line: pixels.getLines(0, r.size())){

But is giving me the same error.Also, worth to mention that I’m initializing the pixels variable before parallel_for, with auto pixels = image->getPixels();

Also, in the loop the code completition is not suggesting me the method getLines

But I’ve it before the loop


#38

And of course also this is not working

    ofPixels_ <unsigned char>pixels = image->getPixels();
    parallel_for( tbb::blocked_range<size_t>(0,pixels.getHeight()),
                 [=](const tbb::blocked_range<size_t>& r) {
                     for(auto &line: pixels.getConstLines(r.begin(), r.end() - r.begin())) {

because when I try to access the pixel pixel[1] = color.g; it gives me an error because it is a const value.


#39

Thank you @roymacdonald and @arturo for your help in this issue, I’m back again with this problem with multithreading that is still not solved.
This is what I’ve tried. As getLines or getConstLines are returning const values, I’ve thought that it could make sense to have an empty ofPixel object, that I could fulfil during the loop. I’ve declared ofPixel renderPixel in my header and, in the loop, I’ve changed the code that set the color for each pixel:

 for (auto &pixel: line.getPixels()) {
     glm::vec3 P;
     glm::vec3 w;

     // Find the ray through (x, y) and the center of projection
     camera.getPrimaryRay(float(x) + 0.5f, float(y) + 0.5f, width, height, P, w);
     auto color = L_i(ofxRTRay(P, w));
     renderPixels.setColor(x,y,color);
//                             pixel[0] = color.r;
//                             pixel[1] = color.g;
//                             pixel[2] = color.b;
//                             pixel[3] = color.a;
     x++;
 }

But his does not work because (I think) the thread access the renderPixel object at the same time and there is some concurrency issue:

I’m now stuck with this problem, any suggestion or attempt is really appreciated.
The current status of the project is in the github repository https://github.com/edap/ofxRayTracer


#40

if you use no reference in the for loop you should get an object you can also write to