Recently we were given an un-optimized raytracer so we could work on making it multi-Threaded and some optimization techniques. During my time working with this project, I spent a lot of time trying to understand what multithreading is and how it works.
During the classes I asked lot of questions to help further my understanding. I feel like I have a good grasp of the concept now, allowing me to implement it into the raytracer and actually understand what is happening.
I used openmp for multithreading for the project. When I first got the project, it took ~64 seconds to render. By the end, I got it down to ~11. I would like to continue further with this in the future, but at the moment, I am focusing on other projects.
One problem I ran into while working was that you had to enable openmp. I was implementing it without seeing any effects. This signalled a red flag to me, so I researched it a bit and found out that you had to enable it within visual studios. Once that was done, the program was running much much faster.
What I did was use openmp to determine how many processors the computer had and based everything off that. To do that, it was one simple function:
unsigned int nProcessors = omp_get_max_threads();
Once that was complete, I was able to tell openmp to create threads equal to 4 times the amount of cores the computer had. I did that with:
omp_set_num_threads(nProcessors * 4);
I came to the number of threads by trial. I found that it was much faster to have double the amount of processors rather than the same amount. Then I kept increasing it, with it peaking at around 4 times the amount. Anything more and it was slowing down due to the excess time required to keep swapping threads.
The next process was the utilise multithreading. To do so, at loops where the program can run in parallel, I used:
#pragma omp parallel for
This broke the for loop down and assigned work to each different thread. An example of that is if there was a for loop like so:
for(int i = 0; i < 100; i++)
and there is four threads, it will assign the work load equally. The first thread will compute from 0 -24, the second thread will do 25-49 and so on. Then they can all run parallel making the program run much faster. At this point, the program was running at ~16 seconds.
In the way of optimisation, I didn’t really do too much. The main thing I did was think about how shadows were handled. I realised that when doing intersections to determine where the shadows were, if it intersected with one thing, then we knew we would have to cast shadows. Therefore, rather than loop through all the rest of the checks, I just returned form the function. Upon comparison of the old and new versions of the rendered images, there were no differences.
Overall, I feel like with this project, I have achieved a much better understanding of how threads work. I managed to do a little optimization appart from the threads and got it to render in ~11 seconds rather than the original ~64 seconds.
Link to raytacer: https://github.com/Silcoish/studio3raytracer