Re: Precision problems with CCM3


Subject: Re: Precision problems with CCM3
From: Wesley Jones (wesley@spartan.asd.sgi.com)
Date: Fri Mar 21 1997 - 14:28:31 MST


CCM users,

        With regards to the note by Carlos Fernandez about correctness and the
performance note by Andrea Hahmann.

        I am interested in making sure that the CCM3 results are correct to the
appropriate precision for high optimization on many SGI microprocessor systems
R4000, R4400, R5000, R8000 and R10000. This is primarily because we are able
to show 1.15 GF of sustained performance with CCM3 T42L18 on a 16 CPU
Origin2000 and 2.00 GF on a 32 CPU origin2000. I am hoping to have the T42
case scale reasonablly to 64 processors and have it scale to 128 with a larger
problem size (T106).

        When I originally ran CCM2 benchmarks on R8000 Power Challenge
computers we were able to Run with -O3 optimization and there was a test for
the precision of the benchmark which we satisfied.

        With the introduction of the R10000 processor last summer a new
compiler was itroduced which does standard optimization for cache based
machines as well as automatically inlining vector intrinsic functions into CCM3
for improved performance. This compiler is more robust for floating point
codes than previous compilers for the R4000 chips (-32) and for the R8000
processors.

        I would recomend that those people running CCM3 upgrade to the 6.2,6.3
or 6.4 O/S depending on their machine type and upgrade to the 7.1 compiler.
 The 7.1 compiler will create code for all of the above mentioned processors.
 When using an R4000 or R4400 processor use the -mips3 instruction set,
otherwise use the -mips4 intruction set.
        The way to find out current CPU, O/S and compiler is as follows:

farewell 143% hinv -t cpu
CPU: MIPS R10000 Processor Chip Revision: 2.6
farewell 144% uname -a
IRIX64 farewell 6.4-jlr-root 03140927 IP27
farewell 145% versions -b |grep ftn
I ftn77_dev 11/21/96 Fortran 77, 7.1

note: smallest.F is compiled with a lower optimization level than other code
because it attempts to calculate the machine accuracy. Optimization of this
routine causes it to return a machine accuracy of 0.

Here is my Makefile for CCM3:

# This Makefile produces the NCAR Community Climate Model executable on
# SGI architectures.
#
# Compiler flags:
# The -r8 flag sets the default REAL size to 8 bytes
# The -i4 flag sets the default INTEGER size to 4 bytes
# The -g flag produces symbol table information for debugging.
# The -O3 flag optimizes execution speed.
#
EXEDIR = ./run
EXENAME = ccm3bin
CFLAGS = -mips4 -64 -r8 -I. -DSGI
SPEC_FFLAGS = -mips4 -64 -I. -r8 -i4 -O2 -c
FFLAGS = -mips4 -64 -I. -r8 -i4 -O3 -OPT:IEEE_arithmetic=3:roundoff=3 -c -mp
-mpio -IPA
LDFLAGS = -mips4 -64 -I. -r8 -i4 -O3 -OPT:IEEE_arithmetic=3:roundoff=3 -c -mp
 -mpio -lfastm -IPA:max_job=8
RM = rm

include Objects

.SUFFIXES:
.SUFFIXES: .F .c .o
$(EXEDIR)/$(EXENAME): $(OBJS)
        $(FC) -o $@ $(OBJS) $(LDFLAGS)

smallest.o: smallest.F params.h implicit.h
        $(F77) $(SPEC_FFLAGS) smallest.F
lsmdrv.o: lsmdrv.F params.h implicit.h
        $(F77) -pfa $(FFLAGS) lsmdrv.F
.F.o:
        $(F77) $(FFLAGS) $<
clean:
        $(RM) -f $(OBJS)

include Depends

#Note: for those running on an R4000 or R4000 processor based system
# use -mips3 instead of -mips4
#Note: for those running on an R4000, R4000 or R5000 processor based system
# use -n32 instead of -64
# add -lfpe to LDFLAGS
# and >setenv TRAP_FPE "UNDERFL=FLUSH_ZERO" when running.

-- 
Wesley B. Jones, PhD                        wesley@asd.sgi.com          
Supercomputer Applications                  Phone: (415)-933-2992               
Silicon Graphics Computer Systems           FAX: (415)-933-3562



This archive was generated by hypermail 2b27 : Thu Jun 01 2000 - 09:10:50 MDT