Previous Section Headers
3.4.1 Shared-Memory Multitasking
CCM3.6 has very little single-threaded work in it. One can therefore
exploit all available parallel degrees of freedom (e.g., number of latitude
bands) and see significant improvement in turn-around time compared to
a single-threaded run. Thus, the number of processors to multi-task
over is limited to the number of latitudes. Observed scaling may be close
to linear or significantly worse depending upon how well the memory system
scales on the machine in question. For example, very good scaling
has been observed on an SGI O2000 machine, but the same code doesn't scale
nearly as well on the Cray PVP architectures. However, scaling may
be dependent on the total amount of memory used. For example, a very high
resolution run may not scale nearly as well as a lower resolution run.
The shared-memory code is multitasked using directives which take the form
of Fortran comments. Such directives exist for Cray, SGI, and HP
architectures. On the Cray, they take the form:
CMIC$ DO ALL SHARED (...) PRIVATE (...).
The DO ALL portion of the directive tells the compiler that each
iteration of the next loop can be done independently. Names within the
parentheses after SHARED indicate those variables that are global
to the loop and will be shared across multiple processes. Conversely, variables
declared as PRIVATE will have
separate storage allocated for each process. Examples of variables
that may be shared include those which are read-only within the loop, or
which have separate storage already allocated for each iteration of the
loop. Variables taking different values during separate iterations of the
loop must be private. The loop index itself is an example of such a variable.
Autotasking compiler directives exist only in routines ccmoce, dyndrv,
lsmdrv, scan1ac, scan1bc, scan2, scanslt, sltini, ccmoce, and somoce.
These routines drive all the physics and dynamics.
In any multitasked code, work done by one processor must be independent
of the work being done (potentially) simultaneously on all other processors.
Iterations of the Gaussian latitude loop in scan1ac are independent
of latitude. Therefore, this routine contains a multitasked loop of the
form:
CMIC$ DO ALL SHARED(...) PRIVATE(...)
do lat=1,plat
...
end do
Computations within the loop for each value of lat are done independently.
Work is parceled out to available processors until all iterations are complete,
with an implied synchronization point at the end of the loop.
In the SLT routines which drive multitasked loops (scanslt and
sltini), each latitudinal iteration may be done in parallel. In
the spectral dynamics driven by dyndrv, the Gaussian quadrature
and semi-implicit timestep computations are parallelized over diagonals
of "m-n" wavenumber space on PVP machines, and over total wavenumber "n"
otherwise. Subroutine dyndrv also drives the horizontal diffusion
calculations, which are parallelized over the vertical level index.
It is important to realize that there is no guarantee of the order in which
multitasked iterations of a loop will be either initiated or completed.
For this reason, coding constructs of the form,
subroutine xxx(lat)
if (lat.eq.1) then
... code to initialize static variables ...
end if
will not work inside multitasked regions of code. If lat=2
happens to be the first process to reach routine xxx, the variables
set within the above "if" construct of the routine will not be
properly initialized. To guarantee that routine xxx will work
properly when multitasked, static variables which may vary across multiple
calls to the routine (i.e. in this case across latitude bands) must be
set in a single-threaded part of the code before the routine is called. Another
result of the unpredictability of calculation order under multitasking
is that direct access I/O to the work file units becomes necessary. If
sequential access were used, there would be no guarantee that a given processor's
request for a latitude band of data would fall in the correct order. Standard
Fortran direct-access read and write statements are employed,
so there is no loss of code portability.
Sequential I/O is used for the history
files, even though the output order of latitude bands is totally unpredictable
in a multitasked run. Sequential I/O is necessary because the history files
contain variable-length records, and standard Fortran direct access I/O
requires fixed-length records. To enable identification of the latitude
band of each record by post-processing programs, the latitude index of
each band is included as the first value in each data record of the history
file.
Sub Sections
-
3.4.1.1 Shared-Memory Management
-
Search page
Questions on these pages can be sent to...
erik@ucar.edu .
$Name: ccm3_6_6_latest2 $ $Revision: 1.38.2.1 $ $Date: 1999/03/25 21:38:35 $ $Author: erik $