Subject: RE: CCM3 and MPI
From: Gibbas, Mark J. (mjgibbas@tasc.com)
Date: Thu Mar 25 1999 - 07:41:59 MST
MY INPUTS BELOW
-----Original Message-----
From: Keith Eric Grant [mailto:keg@strathspey.llnl.gov]
Sent: Monday, March 15, 1999 5:07 PM
To: Mark Gibbas
Subject: CCM3 and MPI
Mark,
Well, yours is the first reply I received on that posting. I believe just
last week I found a workaround to the problem.
What happens, when multiple tapes are requested, is that messages are sent
to processor zero in the order file varying within latitude while they are
received latitude within file.
This sets up a situation in which buffering of messages is required for
completion. Gropp et al in "Using MPI" note on pages 75-76 that such
buffering is allowed but not required by the standard. The MPI_SEND is
considered to be a blocking send, hence the problem when a processor has
send one message to processor 0 and processor 0 is waiting for a message
that will only be sent if the current message is picked up.
I have made some minor changes to WHIST.F that appear to get around this
problem, although I haven't had a chance to examine the output files
produced thoroughly. A coworker running CCM3 on the DEC Alphas believes that
this change may have also removed what he believes is data corruption in an
auxiliary data file.
I'd appreciate hearing back if this solves your problem and on what
environment. We intend to post this info to the CCM3 mailing group, once
we've tested this a bit more.
Thanks.
WHIST.F follows:
----------------------------------------------------------------------------
-
#include <misc.h>
#include <params.h>
subroutine wshist(ktape ,lat ,mflds ,hbuf)
C-----------------------------------------------------------------------
C
C Transfer latitude data record to history file unit for ktape
C
C The cpp SPMD definition provides for the funnelling of all program i/o
C through the master processor. Processor 0 either reads restart/history
C data from the disk and distributes it to all processors, or collects
C data from all processors and writes it to disk.
C
C---------------------------Code history--------------------------------
C
C Original version: CCM1
C Standardized: L. Bath, June 1992
C T. Acker, March 1996
C
C-----------------------------------------------------------------------
c
c $Id: wshist.F,v 1.2 1998/07/22 18:19:46 rosinski Exp $
c $Author: rosinski $
c
C-----------------------------------------------------------------------
#include <implicit.h>
C------------------------------Parameters-------------------------------
#include <pmgrid.h>
C-----------------------------------------------------------------------
#include <pagrid.h>
C------------------------------Commons----------------------------------
#include <comhst.h>
C-----------------------------------------------------------------------
#if ( defined SPMD )
#include <comspmd.h>
#include <mpif.h>
integer msgtype,ier,stat(MPI_STATUS_SIZE),iproc
parameter (msgtype=17000)
logical recvd(plat,ptapes) ! latitude has been recvd (proc 0)
integer count(ptapes) ! count of recvd latitudes (proc 0)
save count,recvd
logical flag
integer j
#endif
C------------------------------Arguments--------------------------------
C
C Input arguments
C
integer ktape ! History file number
integer lat ! Record number to write
integer mflds(3,*) ! Integer header information for ktape
real hbuf(*) ! Address of ktape portion of history buffer
C
pointer (phfield,hfield)
real hfield(plon,*) ! Part of hist. file relevant to current field
C
C---------------------------Local Parameters-----------------------------
C
real tootiny
#if ( defined CRAY )
parameter (tootiny=1.e-2000)
#else
parameter (tootiny=0.)
#endif
C
C---------------------------Local workspace-----------------------------
C
integer ifield ! Field index
integer ip ! Incrementing pointer thru unpacked h-file buffer
integer ibufp ! Incrementing pointer through packed h-file buffer
integer itape ! History file number index
integer jtape ! First active history file for current time
integer ndensh ! Packing density
integer kfld ! Field index for nacs,fnorm
integer i ! Longitude index
integer k ! Vertical index
integer numlev ! Number of levels in model field
C
real packsp(plond) ! Packed data array
real fnorm(pflds) ! Divisor for accumulated fields
real dummy ! Placeholder for call to packaf
C
integer iplen ! Length of packed data
C
C------------------------------Externals--------------------------------
C
external packaf ! Fortran packer, library routine
external resetr ! Set real array to specified value
external wrtharr ! Array write of history buffer
C
C-----------------------------------------------------------------------
C
C Initialize history buffer pointers for packed and unpacked data, and
C field index.
C
ibufp = 1
ip = 1
kfld = 1
if (ktape.gt.1) then
do itape = 1,ktape-1
kfld = kfld + nflds(itape)
end do
end if
C
C Beginning of loop over no. of fields in history file
C
do 100 ifield = 1,nflds(ktape)
C
C Set "numlev" appropriately for single or multi-level field
C
if (mod(mflds(1,ifield),10).ge.1) then
numlev = plev
else
numlev = 1
end if
C
C Set pointer to appropriate location in history buffer for this field.
C After setting this pointer, array "hfield" may be accessed. Note that
C hfield is always a part of hbuf.
C
phfield = loc(hbuf(ip))
C
C normalize by number of accumulations.
C
if (mflds(1,ifield)/10.gt.0) then
fnorm(kfld) = 1./amax0(nacs(kfld,lat),1)
nacs(kfld,lat) = 0
do k=1,numlev
do i=1,plon
hfield(i,k) = hfield(i,k)*fnorm(kfld)
end do
end do
end if
C
C Pack field if requested. Note that the field is packed on top
C of itself. This is done to save memory because the history buffer
C is so huge.
C
ndensh = mflds(3,ifield)
cjr
c protect against bug in packer and cprtps which screws up on very
c small numbers
cjr
do k=1,numlev
do i=1,plon
if (abs(hfield(i,k)).lt.tootiny) then
hfield(i,k) = 0.
endif
end do
end do
C
if (ndensh.gt.1) then
iplen = (plon+ndensh-1)/ndensh + 2
do k=1,numlev
call packaf(hfield(1,k), plon, packsp, ndensh, dummy,
$ dummy)
do i=1,iplen
hbuf(ibufp+i-1) = packsp(i)
end do
ibufp = ibufp + iplen
end do
else
ibufp = ibufp + numlev*plon
end if
ip = ip + numlev*plon ! Set unpacked pointer for next field
kfld = kfld + 1
100 continue
C
C End of loop over no. of fields in history file. Now write data out to
disk
C
#if ( defined SPMD )
if (.not.masterproc) then
c
c Pack and ship my data to processor 0
c
c write(0,*)'WSHIST: Sending my latitude,ktape = ',lat,ktape
call MPI_SEND(hbuf,nplen(ktape),
$ REALTYPE,0,
$ msgtype+lat,MPI_COMM_WORLD,ier)
else
c write(0,*)'WSHIST: writing my latitude,ktape = ',lat,ktape
call wrtharr(hunit(ktape),hbuf,nplen(ktape),lat,plon)
cp
c The following code assumes lat=1 is on processor 0. Get all latitudes for
c a given processor now since number of lats per processor may vary.
c
c xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
c
c.... Modifications by Keith Eric Grant, LLNL, 09 March 1999
c
c.... Pick up output from other processors only on first lat
c.... and for call for first history tape.
c
c.... Inner loop over active history files, so that output is
c.... picked up from processors in the order that it is sent.
c.... This prevents an MPI lockup on a blocking MPI_SEND. The
c.... original code could fail when there were auxillary
c... history files.
c
c xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
c
c
c.... Determine the first active history file at current time
jtape = 0
do itape=1,mtapes
if ( jtape .eq. 0 .and. hstwr(itape) ) jtape = itape
end do
c.... Get output from other processors for all history files
c.... only on the first call to processor 0
c
if (lat.eq.1 .and. ktape .eq. jtape ) then
do iproc=1,npes-1
do j=cut(1,iproc),cut(2,iproc)
do itape=1,mtapes
if (hstwr(itape)) then
c
c receive latitude "j"
c
c write(0,*)'WSHIST: Receiving latitude,ktape = ',j,itape
call MPI_RECV(hbuf,nplen(itape),
$ REALTYPE,proc(j),
$ msgtype+j,MPI_COMM_WORLD,stat,ier)
c write(0,*)'WSHIST: writing latitude,ktape = ',j,itape
call wrtharr(hunit(itape),hbuf,nplen(itape),j,plon)
endif
end do
end do
end do
end if
end if
#else
call wrtharr(hunit(ktape) ,hbuf ,nplen(ktape) ,
$ lat ,plon)
#endif
C
C Clear history file buffer (unpacked length)
C
if (nrlen(ktape).gt.0) call resetr(hbuf ,nrlen(ktape),0.)
C
return
end
-- +-----------------------------+-------------------------------------------+ I Keith Eric Grant I Common sense and a sense of humor are the I I I same thing, moving at different speeds. I I Atmospheric Science Div I A sense of humor is just common sense, I I P.O. Box 808, L-103 I dancing. ... Clive James I I Lawrence Livrmr Natn'l Lab I I I EMail: keg@llnl.gov I (or perhaps dancing is just common sense) I I FAX: (925) 422-5844 I I +-----------------------------+-------------------------------------------+Keith,
Sorry it has taken so long to get back to you. I will be trying your fix shortly, but in the mean time I have worked around the problem by setting INITHIST = 'NONE'. In doing so the CCM skips the process of writing out end of the month files. I don't see my work around as a long term solution, so I am very interested in your method. I'll let you know how it goes when I try it. Thanks for the input!
Regards, Mark
-------------------------- Mark Gibbas Litton-TASC 55 Walkers Brook Drive Reading MA 01867 E-Mail: mjgibbas@tasc.com Tel: 781-942-2000 x3188 Fax: 781-942-2571 --------------------------
This archive was generated by hypermail 2b27 : Thu Jun 01 2000 - 09:27:17 MDT