The parallelization of the rate calculation using ray tracing is very simple on shared memory architectures. Due to the assumption of ballistic transport, particle trajectories are independent of each other. The total number of particles can simply be distributed over multiple CPUs. Basically, this can be realized through parallelization of a loop, which is straightforward when using OpenMP [23]. To avoid simultaneous write accesses to the surface rates, individual arrays are used for each CPU, which are finally summed up.