Subject: CCM3 on Beowulf cluster - run time errors
From: Chul Eddy Chung (cchung@fiji.ucsd.edu)
Date: Tue Oct 17 2000 - 11:48:00 MDT
Dear CCM users,
I'd like to report my problem, hoping that somebody can help me out about it.
I have a Beowulf cluster computer consisting of 1 master + 8 slaves (each of them
being a 1-CPU PC). I installed PGI CDK and netCDF 3.5 beta version. PGI CDK
includes MPI library, which is apparently linkable with pgf90; my netCDF library
is also linkable with pgf90, as I compiled the netCDF package using pgf90.
I tested MPI library and netCDF library using a sample pgf90 code, and they worked
fine.
Then, I downloaded the cluster version of CCM3. I erased "Msecond_underscore" in
Makefile, and changed one line under LINUX into "#define FORTRANUNDERSCORE in
/src/ccmlsm_share/cfort.h, because none of my libmpich.a and libnetcdf.a has two
underscores.
Then, the compilation of CCM3 does not give me any error message. However, I have
run time error messages:
(I quote)
NODE# NAME
0 gcm1.ucsd.edu
1 n01.ucsd.edu
2 n02.ucsd.edu
3 n03.ucsd.edu
.......................
47 QPERT
max rss=0 shared mem=0 unshared data=0 unshared$max rss=0 shared mem=0 unshared
data=0 unshared stack=0
max rss=0 shared mem=0 unshared data=0 unshared stack=0
max rss=0 shared mem=0 unshared data=0 unshared stack=0
max rss=0 shared mem=0 unshared data=0 unshared stack=0
max rss=0 shared mem=0 unshared data=0 unshared stack=0
max rss=0 shared mem=0 unshared data=0 unshared stack=0
rm_l_1_6060: p4_error: net_recv read: probable EOF on socket: 1
rm_l_2_5983: p4_error: net_recv read: probable EOF on socket: 1
rm_l_3_6046: p4_error: net_recv read: probable EOF on socket: 1
1 A
48 QRL 18 A
......................
p0_5739: (16.420643) Trying to receive a message when there are no connections;$
--------------------------------------------------------
then the model stops. The model stops just before it makes the first time
integration. I believe that the model stops as slaves participate.
I have very little idea what is wrong. If I can get any clue, I'd be happy. One
thing I now suspect is the file /etc/fstab at each slave, which has this line
"192.168.0.100:/CCM /CCM nfs noac,rsize=8192,wsize=8192". /CCM is the
CCM3 working directory and is exported from the master.
Bye.
-- Chul Eddy Chung: http://www-c4.ucsd.edu/personnel/cchung Postdoc with Ramanathan, Center for Atmospheric Sciences Scripps Institution of Oceanography, UCSD Tel) 858-822-1356 Fax) 858-534-7452
This archive was generated by hypermail 2b27 : Thu Jan 04 2001 - 10:02:08 MST