3.4.1 Shared-Memory Multitasking

Go to the bottom of this page. See the search engine and sub-section links.

Previous Section Headers

User's Guide to NCAR CCM3.6 Search Page

3. CCM3.6 Internals

3.4 Multitasking Strategy

3.4.1 Shared-Memory Multitasking

CCM3.6 has very little single-threaded work in it. One can therefore exploit all available parallel degrees of freedom (e.g., number of latitude bands) and see significant improvement in turn-around time compared to a single-threaded run. Thus, the number of processors to multi-task over is limited to the number of latitudes. Observed scaling may be close to linear or significantly worse depending upon how well the memory system scales on the machine in question. For example, very good scaling has been observed on an SGI O2000 machine, but the same code doesn't scale nearly as well on the Cray PVP architectures. However, scaling may be dependent on the total amount of memory used. For example, a very high resolution run may not scale nearly as well as a lower resolution run.

The shared-memory code is multitasked using directives which take the form of Fortran comments. Such directives exist for Cray, SGI, and HP architectures. On the Cray, they take the form:

CMIC$ DO ALL SHARED (...) PRIVATE (...).

The DO ALL portion of the directive tells the compiler that each iteration of the next loop can be done independently. Names within the parentheses after SHARED indicate those variables that are global to the loop and will be shared across multiple processes. Conversely, variables declared as PRIVATE will have separate storage allocated for each process. Examples of variables that may be shared include those which are read-only within the loop, or which have separate storage already allocated for each iteration of the loop. Variables taking different values during separate iterations of the loop must be private. The loop index itself is an example of such a variable. Autotasking compiler directives exist only in routines ccmoce, dyndrv, lsmdrv, scan1ac, scan1bc, scan2, scanslt, sltini, ccmoce,and somoce. These routines drive all the physics and dynamics.

In any multitasked code, work done by one processor must be independent of the work being done (potentially) simultaneously on all other processors. Iterations of the Gaussian latitude loop in scan1ac are independent of latitude. Therefore, this routine contains a multitasked loop of the form:

CMIC$ DO ALL SHARED(...) PRIVATE(...)
      do lat=1,plat
      ...
      end do

Computations within the loop for each value of lat are done independently. Work is parceled out to available processors until all iterations are complete, with an implied synchronization point at the end of the loop.

In the SLT routines which drive multitasked loops (scanslt and sltini), each latitudinal iteration may be done in parallel. In the spectral dynamics driven by dyndrv, the Gaussian quadrature and semi-implicit timestep computations are parallelized over diagonals of "m-n" wavenumber space on PVP machines, and over total wavenumber "n" otherwise. Subroutine dyndrv also drives the horizontal diffusion calculations, which are parallelized over the vertical level index.

It is important to realize that there is no guarantee of the order in which multitasked iterations of a loop will be either initiated or completed. For this reason, coding constructs of the form,

       subroutine xxx(lat)
   if (lat.eq.1) then
   ... code to initialize static variables ...
   end if

will not work inside multitasked regions of code. If lat=2 happens to be the first process to reach routine xxx, the variables set within the above "if" construct of the routine will not be properly initialized. To guarantee that routine xxx will work properly when multitasked, static variables which may vary across multiple calls to the routine (i.e. in this case across latitude bands) must be set in a single-threaded part of the code before the routine is called. Another result of the unpredictability of calculation order under multitasking is that direct access I/O to the work file units becomes necessary. If sequential access were used, there would be no guarantee that a given processor's request for a latitude band of data would fall in the correct order. Standard Fortran direct-access read and write statements are employed, so there is no loss of code portability.

Sequential I/O is used for the history files, even though the output order of latitude bands is totally unpredictable in a multitasked run. Sequential I/O is necessary because the history files contain variable-length records, and standard Fortran direct access I/O requires fixed-length records. To enable identification of the latitude band of each record by post-processing programs, the latitude index of each band is included as the first value in each data record of the history file.

Sub Sections

3.4.1.1 Shared-Memory Management

Go to the top of this page. See links to previous section headers.

Search for keywords in the CCM3.6 Users Guide Search page

Questions on these pages can be sent to... erik@ucar.edu .

$Name: ccm3_6_6_latest2 $ $Revision: 1.38.2.1 $ $Date: 1999/03/25 21:38:35 $ $Author: erik $

Previous Section Headers

User's Guide to NCAR CCM3.6 Search Page 3. CCM3.6 Internals 3.4 Multitasking Strategy

3.4.1 Shared-Memory Multitasking

Sub Sections

User's Guide to NCAR CCM3.6 Search Page

3. CCM3.6 Internals

3.4 Multitasking Strategy