I know the documentation and configuration panel seem to imply that parpool uses the number of physical cores by default, but in my tests I have seen otherwise (namely, logical cores). Coupled with the non-negligible overhead of starting, coordinating and communicating with twice as many Matlab instances (workers are headless Matlab processes after all), we reach a conclusion that it may actually be better in many cases to use only as many workers as physical (not logical) cores. However, in many situations, hyperthreading does not improve the performance of a program and may even degrade it (I deliberately wish to avoid the heated debate over this: you can find endless discussions about it online and decide for yourself). On Intel CPUs, the OS reports two logical cores per each physical core due to hyper-threading, for a total of 4 workers on a dual-core machine. By default, Matlab creates as many workers as logical CPU cores. The first tip is to not use the default number of workers created by parpool (or matlabpool in R2013a or earlier). Furthermore, I limit myself only to parfor in this post: much can be said about spmd, GPU and other parallel constructs, but not today. In today’s post I will try not to reiterate the official tips, but rather those that I have not found mentioned elsewhere, and/or are not well-known (my apologies in advance if I missed an official mention of one or more of the following). Naturally, to use any of today’s tips, you need to have MathWorks’ Parallel Computing Toolbox (PCT).īefore diving into the technical details, let me say that MathWorks has extensive documentation on PCT. The overall effect can be dramatic: The performance (speed) difference between a sub-optimal and optimized parfor‘ed code can be up to a full order of magnitude, depending on the specific situation. In today’s post I plan to expand on these tips, as well as provide a few others that for lack of space and time I did not mention in the presentation. During the presentation, I skimmed over a few tips for improving performance of parallel-processing ( parfor) loops.
0 Comments
Leave a Reply. |