Particles README --------------------------------- DISCLAIMER and CREDIT: This code is offered without warranty, as is. The base particle code originally came from the class site for UC-Berkeley's CS 267. It has been extensively modified since, including the creation of the GPU codes. (UC has since also independently developed a CUDA version) The modifications to the base code were performed by Dan Ernst and Brandon Holt while both were at UW-Eau Claire. OVERVIEW --------------------------------- This particle simulation is an example of a simple N-Body problem. It simulates a number of particles that all interact through a simplistic short-range repulsive force that drops off by the distance squared (similar to a repulsive electrostatic force). Executables usage: (./particles_cpu or ./particles_cuda) options: -n # (number of particles) -o filename (name of file to output position data to every time step) example call: ./particles_cuda -n 10000 -o out.txt These output files can be visualized with the included visualizer programs. While you do not want the -o option on for performance testing, it is an excellent way to examine your code for basic correctness issues. (Remember: getting the wrong answer quickly isn't helpful!) Both executables will print to stdout the time it took to calculate all the forces each time step. Each directory has a Makefile included to simply recompile the version within that directory (serial or cuda versions) every time it is called. You can build both executables by calling: "make" or each one individually by calling "make cuda" or "make cpu" respectively. LAB --------------------------------- As each particle feels a force from every other particle, the complexity is O(n^2). Luckily for us, though, it is highly parallelizable as each body's acceleration can be calculated independently of the others for each time step. For this exercise we will focus on just calculating the forces on the GPU and will leave the rest of the calculations up to the CPU. A CUDA kernel has already been written for you which calculates all the forces for all the particles, but it has been implemented with no regard for efficiency. Your task will be to rewrite the kernel using what you've learned about the GPU architecture and CUDA optimization techniques. Here are some places you might want to start: - Take a look at where in memory things are, and consider ways that you can take advantage of faster kinds of memory. Specifically, you could look into using shared memory. Each thread calculates the forces on itself due to every other particle, so each one always needs to know its own particle's position, but the other particles' positions could be loaded into shared memory in chunks (or tiles if you want to think about it that way). - Try tweaking the number of threads and the number of blocks. Currently the number of threads per block is set to 256, but it might be faster to use more or less. - You may be able to reduce the number of floating point operations needed to calculate the forces. Try compiling with the "-use_fast_math" option to use the optimized cuda math functions, and look into "rsqrtf()" in particular.