Go to the bottom of this page. See the search engine and sub-section links.
Go to next page Go to previous page Go to top of this section Go to top page Go to table of contents

Previous Section Headers

User's Guide to NCAR CCM3.6 Search Page


3. CCM3.6 Internals
    3.4 Multitasking Strategy


3.4.2 Distributed Memory Multitasking

The one-dimensional latitudinal data decomposition employed in the shared-memory multitasked code was also employed in the distributed- memory MPI implementation. Obviously this approach places limitations on the parallelism exploited in the application (i.e. a maximum of 64 processes at a horizontal resolution of T42), but it was felt to be the only viable approach to enabling either type of multitasking from within a single set of source code. One payoff of this approach is that nearly all routines which do gridpoint space computations are identical regardless of whether the target architecture is shared-memory or distributed-memory.

The spectral dynamics in the distributed-memory code is parallelized over Fourier wavenumber "m" rather than diagonals of "m-n" wavenumber space (see Figure 3.6 ) since the spectral transform technique requires summations over total wavenumber "n".  These are best done on-processor (each processor will do the sum over "n" as these arrays are stored locally on the given processor). Since the loops which perform spectral space computations therefore look completely different in the distributed-memory code, some source files contain two subroutines (separated by #ifdef's), one for each implementation.  Shared-memory code was not rewritten to employ this same data ordering (which would have eliminated a large number of #ifdef's) because much longer inner loops are generated when the data are stored along diagonals of "m-n" wavenumber space (resulting in much higher vector performance on a PVP machine).

Due to hemispheric symmetry considerations in the spectral dynamics, the number of processors utilized in the distributed-memory code ( $NPES ) must be an even number. Load balance considerations dictate that a value of $NPES which divides evenly into the total number of Gaussian latitudes works best, though this is not mandatory. 


Sub Sections

    3.4.2.1 Distributed Memory Management
    3.4.2.2 Distributed Memory I/O


 Go to the top of this page. See links to previous section headers.
Go to next page Go to previous page Go to top of this section Go to top page Go to table of contents

 Search for keywords in the CCM3.6 Users GuideSearch page

Questions on these pages can be sent to... erik@ucar.edu .


$Name: ccm3_6_6_latest2 $ $Revision: 1.37.2.1 $ $Date: 1999/03/25 21:38:37 $ $Author: erik $