RE: CCM3 and MPI


Subject: RE: CCM3 and MPI
From: Gibbas, Mark J. (mjgibbas@tasc.com)
Date: Thu Mar 25 1999 - 07:41:59 MST


MY INPUTS BELOW

-----Original Message-----
From: Keith Eric Grant [mailto:keg@strathspey.llnl.gov]
Sent: Monday, March 15, 1999 5:07 PM
To: Mark Gibbas
Subject: CCM3 and MPI

Mark,

Well, yours is the first reply I received on that posting. I believe just
last week I found a workaround to the problem.

What happens, when multiple tapes are requested, is that messages are sent
to processor zero in the order file varying within latitude while they are
received latitude within file.

This sets up a situation in which buffering of messages is required for
completion. Gropp et al in "Using MPI" note on pages 75-76 that such
buffering is allowed but not required by the standard. The MPI_SEND is
considered to be a blocking send, hence the problem when a processor has
send one message to processor 0 and processor 0 is waiting for a message
that will only be sent if the current message is picked up.

I have made some minor changes to WHIST.F that appear to get around this
problem, although I haven't had a chance to examine the output files
produced thoroughly. A coworker running CCM3 on the DEC Alphas believes that
this change may have also removed what he believes is data corruption in an
auxiliary data file.

I'd appreciate hearing back if this solves your problem and on what
environment. We intend to post this info to the CCM3 mailing group, once
we've tested this a bit more.

Thanks.

WHIST.F follows:

----------------------------------------------------------------------------
-

#include <misc.h>
#include <params.h>
      subroutine wshist(ktape ,lat ,mflds ,hbuf)
C-----------------------------------------------------------------------
C
C Transfer latitude data record to history file unit for ktape
C
C The cpp SPMD definition provides for the funnelling of all program i/o
C through the master processor. Processor 0 either reads restart/history
C data from the disk and distributes it to all processors, or collects
C data from all processors and writes it to disk.
C
C---------------------------Code history--------------------------------
C
C Original version: CCM1
C Standardized: L. Bath, June 1992
C T. Acker, March 1996
C
C-----------------------------------------------------------------------
c
c $Id: wshist.F,v 1.2 1998/07/22 18:19:46 rosinski Exp $
c $Author: rosinski $
c
C-----------------------------------------------------------------------
#include <implicit.h>
C------------------------------Parameters-------------------------------
#include <pmgrid.h>
C-----------------------------------------------------------------------
#include <pagrid.h>
C------------------------------Commons----------------------------------
#include <comhst.h>
C-----------------------------------------------------------------------
#if ( defined SPMD )
#include <comspmd.h>
#include <mpif.h>
      integer msgtype,ier,stat(MPI_STATUS_SIZE),iproc
      parameter (msgtype=17000)

      logical recvd(plat,ptapes) ! latitude has been recvd (proc 0)
      integer count(ptapes) ! count of recvd latitudes (proc 0)
      save count,recvd

      logical flag
      integer j
#endif
C------------------------------Arguments--------------------------------
C
C Input arguments
C
      integer ktape ! History file number
      integer lat ! Record number to write
      integer mflds(3,*) ! Integer header information for ktape
      real hbuf(*) ! Address of ktape portion of history buffer
C
      pointer (phfield,hfield)
      real hfield(plon,*) ! Part of hist. file relevant to current field
C
C---------------------------Local Parameters-----------------------------
C
      real tootiny
#if ( defined CRAY )
      parameter (tootiny=1.e-2000)
#else
      parameter (tootiny=0.)
#endif
C
C---------------------------Local workspace-----------------------------
C
      integer ifield ! Field index
      integer ip ! Incrementing pointer thru unpacked h-file buffer
      integer ibufp ! Incrementing pointer through packed h-file buffer
      integer itape ! History file number index
      integer jtape ! First active history file for current time
      integer ndensh ! Packing density
      integer kfld ! Field index for nacs,fnorm
      integer i ! Longitude index
      integer k ! Vertical index
      integer numlev ! Number of levels in model field
C
      real packsp(plond) ! Packed data array
      real fnorm(pflds) ! Divisor for accumulated fields
      real dummy ! Placeholder for call to packaf
C
      integer iplen ! Length of packed data
C
C------------------------------Externals--------------------------------
C
      external packaf ! Fortran packer, library routine
      external resetr ! Set real array to specified value
      external wrtharr ! Array write of history buffer
C
C-----------------------------------------------------------------------
C
C Initialize history buffer pointers for packed and unpacked data, and
C field index.
C
      ibufp = 1
      ip = 1
      kfld = 1
      if (ktape.gt.1) then
         do itape = 1,ktape-1
            kfld = kfld + nflds(itape)
         end do
      end if
C
C Beginning of loop over no. of fields in history file
C
      do 100 ifield = 1,nflds(ktape)
C
C Set "numlev" appropriately for single or multi-level field
C
         if (mod(mflds(1,ifield),10).ge.1) then
            numlev = plev
         else
            numlev = 1
         end if
C
C Set pointer to appropriate location in history buffer for this field.
C After setting this pointer, array "hfield" may be accessed. Note that
C hfield is always a part of hbuf.
C
         phfield = loc(hbuf(ip))
C
C normalize by number of accumulations.
C
         if (mflds(1,ifield)/10.gt.0) then
            fnorm(kfld) = 1./amax0(nacs(kfld,lat),1)
            nacs(kfld,lat) = 0
            do k=1,numlev
               do i=1,plon
                  hfield(i,k) = hfield(i,k)*fnorm(kfld)
               end do
            end do
         end if
C
C Pack field if requested. Note that the field is packed on top
C of itself. This is done to save memory because the history buffer
C is so huge.
C
         ndensh = mflds(3,ifield)
cjr
c protect against bug in packer and cprtps which screws up on very
c small numbers
cjr
         do k=1,numlev
           do i=1,plon
             if (abs(hfield(i,k)).lt.tootiny) then
               hfield(i,k) = 0.
             endif
           end do
         end do
C
         if (ndensh.gt.1) then
            iplen = (plon+ndensh-1)/ndensh + 2
            do k=1,numlev
               call packaf(hfield(1,k), plon, packsp, ndensh, dummy,
     $ dummy)
               do i=1,iplen
                  hbuf(ibufp+i-1) = packsp(i)
               end do
               ibufp = ibufp + iplen
            end do
         else
            ibufp = ibufp + numlev*plon
         end if
         ip = ip + numlev*plon ! Set unpacked pointer for next field
         kfld = kfld + 1
  100 continue
C
C End of loop over no. of fields in history file. Now write data out to
disk
C
#if ( defined SPMD )
      if (.not.masterproc) then
c
c Pack and ship my data to processor 0
c
c write(0,*)'WSHIST: Sending my latitude,ktape = ',lat,ktape
        call MPI_SEND(hbuf,nplen(ktape),
     $ REALTYPE,0,
     $ msgtype+lat,MPI_COMM_WORLD,ier)
      else
c write(0,*)'WSHIST: writing my latitude,ktape = ',lat,ktape
        call wrtharr(hunit(ktape),hbuf,nplen(ktape),lat,plon)
cp
c The following code assumes lat=1 is on processor 0. Get all latitudes for
c a given processor now since number of lats per processor may vary.
c
c xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
c
c.... Modifications by Keith Eric Grant, LLNL, 09 March 1999
c
c.... Pick up output from other processors only on first lat
c.... and for call for first history tape.
c
c.... Inner loop over active history files, so that output is
c.... picked up from processors in the order that it is sent.
c.... This prevents an MPI lockup on a blocking MPI_SEND. The
c.... original code could fail when there were auxillary
c... history files.
c
c xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
c
c
c.... Determine the first active history file at current time
        jtape = 0
        do itape=1,mtapes
           if ( jtape .eq. 0 .and. hstwr(itape) ) jtape = itape
        end do

c.... Get output from other processors for all history files
c.... only on the first call to processor 0
c
        if (lat.eq.1 .and. ktape .eq. jtape ) then
          do iproc=1,npes-1
            do j=cut(1,iproc),cut(2,iproc)
              do itape=1,mtapes
              if (hstwr(itape)) then
c
c receive latitude "j"
c
c write(0,*)'WSHIST: Receiving latitude,ktape = ',j,itape
              call MPI_RECV(hbuf,nplen(itape),
     $ REALTYPE,proc(j),
     $ msgtype+j,MPI_COMM_WORLD,stat,ier)
c write(0,*)'WSHIST: writing latitude,ktape = ',j,itape
              call wrtharr(hunit(itape),hbuf,nplen(itape),j,plon)
              endif
              end do
            end do
          end do
        end if
      end if
#else
      call wrtharr(hunit(ktape) ,hbuf ,nplen(ktape) ,
     $ lat ,plon)
#endif
C
C Clear history file buffer (unpacked length)
C
      if (nrlen(ktape).gt.0) call resetr(hbuf ,nrlen(ktape),0.)
C
      return
      end

-- 
 
 +-----------------------------+-------------------------------------------+
 I Keith Eric Grant            I Common sense and a sense of humor are the I
 I                             I same thing, moving at different speeds.   I
 I Atmospheric Science Div     I A sense of humor is just common sense,    I
 I P.O. Box 808, L-103         I dancing.   ... Clive James                I
 I Lawrence Livrmr Natn'l Lab  I                                           I
 I EMail: keg@llnl.gov         I (or perhaps dancing is just common sense) I
 I FAX:   (925) 422-5844       I                                           I
 +-----------------------------+-------------------------------------------+

Keith,

Sorry it has taken so long to get back to you. I will be trying your fix shortly, but in the mean time I have worked around the problem by setting INITHIST = 'NONE'. In doing so the CCM skips the process of writing out end of the month files. I don't see my work around as a long term solution, so I am very interested in your method. I'll let you know how it goes when I try it. Thanks for the input!

Regards, Mark

-------------------------- Mark Gibbas Litton-TASC 55 Walkers Brook Drive Reading MA 01867 E-Mail: mjgibbas@tasc.com Tel: 781-942-2000 x3188 Fax: 781-942-2571 --------------------------



This archive was generated by hypermail 2b27 : Thu Jun 01 2000 - 09:27:17 MDT