The spectral dynamics in the distributed-memory code is parallelized over Fourier wavenumber "m" rather than diagonals of "m-n" wavenumber space (see Figure 3.6 ) since the spectral transform technique requires summations over total wavenumber "n". These are best done on-processor (each processor will do the sum over "n" as these arrays are stored locally on the given processor). Since the loops which perform spectral space computations therefore look completely different in the distributed-memory code, some source files contain two subroutines (separated by #ifdef's), one for each implementation. Shared-memory code was not rewritten to employ this same data ordering (which would have eliminated a large number of #ifdef's) because much longer inner loops are generated when the data are stored along diagonals of "m-n" wavenumber space (resulting in much higher vector performance on a PVP machine).
Due to hemispheric symmetry considerations in the spectral dynamics, the number of processors utilized in the distributed-memory code ( $NPES ) must be an even number. Load balance considerations dictate that a value of $NPES which divides evenly into the total number of Gaussian latitudes works best, though this is not mandatory.
Questions on these pages can be sent to... erik@ucar.edu .