CCM1 Code Generation Problem

CCM Group Directories (ccm@sage.cgd.ucar.edu)
Thu, 2 Jun 94 12:24:54 MDT


From: ccm@sage.cgd.ucar.edu (CCM Group Directories)
Message-Id: <199406021824.MAA01919@grub.cgd.ucar.edu>
Subject: CCM1 Code Generation Problem
To: ccm-users@ncar.ucar.edu
Date: Thu, 2 Jun 94 12:24:54 MDT

*** Notice of Code Generation Problem in CCM1 ***

Several months ago, the CCM Core Group was contacted by a user at the
University of Oklahoma regarding a possible bug in the CCM1
calculation of saturation specific humidity. We have been
investigating this problem ever since, and have verified that there is
indeed an error in the saturation specific humidity produced by
subroutine ESTABV. An analysis of the source code suggested that this
error was the result of an interaction between the ESTABV source code
and recent compilers. We were able to determine that the problem
first materialized with the introduction of the 5.0.2.15 compiler
release which became the shavano default compiler on 6/1/92, and the
castle default compiler on 3/29/93 (the earlier 4.0 release, which
generated correct code is no longer available).

This problem was isolated and forwarded to Cray Research as a critical
problem. Cray's analysis of the code determined that it was not a
compiler problem, but that the source code violated the ANSI Fortran
Standard (i.e., the source code is not in strict compliance with the
Fortran 77 standard prohibiting array references from generating
addresses outside of declared bounds). This code, written more than
ten years ago, was representative of coding practices required at that
time to generate vector code, since the older compilers would
otherwise complain about nonexistent vector dependencies.
Unfortunately, the technique also hid a dependency that surfaces only
when the compiler attempts to make use of two memory read channels and
one memory write channel at the same time. In this case the relevant
source code in ESTABV looks like

CDIR$ IVDEP
DO 15 JL= 1,N
BUF(KES+JL) = (B(4+KTEMP2+JL)+1.-A(5+KQSLEV+JL))*C(3+KES+JL) +
1 (C(3+KQSLEV+JL)-D(2+KTEMP2+JL)) * B(4+KTEMP1+JL)
B(4+KQSLEV+JL) = C1ESS * CRITES*E(1+KES+JL)/(BUF(KPS+JL)*SIGHK
A - (1.-EPSILO)*CRITES*E(1+KES+JL))
15 CONTINUE

where the first replacement statement calculates saturation vapor
pressure, which is then used in the second replacement statement which
calculates saturation specific humidity. Unfortunately (for this
code, that is), the compiler does not recognize that BUF(KES+JL)
is the same data structure as E(1+KES+JL), and issues a read of
E(1+KES+JL) before it is written to memory by the previous replacement
statement. The solution to this problem is to split the loop into two
separate pieces so that B(4+KQSLEV+JL) is evaluated separately.

The ramifications of the problem with ESTABV are difficult to quantify
since we are not able to evaluate how many users have been affected.
Neither are we able to easily determine how this error may have
systematically affected simulations with modified versions of the
CCM1. Historically, when compiler and operating system changes have
resulted in "deterministic" changes to the CCM1 solution, we have
resorted to short "climate" integrations to verify that there were not
unexplainable changes in the gross details of the model climate. In
the past, these tests have successfully exposed code generation
errors. Unfortunately, these tests did not expose the saturation
specific mixing ratio code generation problem; i.e., for most standard
climate metrics a single realization of a CCM1 climate integration
(which includes this error) is **virtually indistinguishable** from
the control integration. Differences in zonally averaged quantities
(e.g., zonal wind, temperature, etc.) appear to be well within the
natural variability of the control simulation. Many global metrics,
such as measures of the hydrological cycle, are also within natural
variability. This is a remarkable result considering the seriousness
of this code generation error. We have, however, been able to
identify a robust signature associated with simulations that contain
improperly generated ESTABV code. In the CCM1 control, total
precipitation is equally partitioned between convective processes and
stable condensation processes (i.e., the global integral of total
precipitation, 3.19 mm/day, is the sum of 1.65 mm/day from convective
precipitation, and 1.54 mm/day from stable precipitation). For a CCM1
integration in which the code generation error is present (post
5.0.2.15 compiler), the total precipitation (which is within the
natural variability of the control) is the sum of slightly more than
70% stable precipitation and slightly less than 30% from convective
precipitation. This suggests that the similarity of the simulated
climates is the result of a fortuitous compensation by processes that
maintain the respective climate states. There is some suggestion
that this may be the case from the outgoing longwave radiation which
in the global mean exhibits changes in excess of 2 w/m**2 (a change
that may be on the high side of natural variability in the CCM1). A
detailed analysis of the diabatic forcing would be required to
evaluate the nature of this compensation, something best done by
investigators using CCM1 in the context of their particular
experimental framework. We encourage investigators to share results
from such analyses with other colleagues via the ccm-users@ncar mail
group.

We note that this code generation problem does not affect any of the
CCM1 controls, all of which were run many years before the compiler
change took place. CCM1 model integrations conducted after 6/1/92 on
Shavano, or 3/29/93 on Castle, are likely to have been affected by
changes in ESTABV code generation.

Finally, there is the issue of what we intend to do about this
problem. For the moment, we will leave the CCM1 source code as it is
and rely on users to make the necessary changes to ESTABV. Since this
bug has affected CCM1 results for so long, we do not want to change
code underneath users who may be in the midst of extended experimental
work with the CCM1. A Cray UPDATE modification will be made available
on the shavano disk, in file /ccm/ccm1/r15/estabv.mods. This is
for the convenience of users who wish to incorporate the ESTABV
change, most likely users who are beginning new numerical experiments
with the CCM1. The decision to incorporate the changes will probably
depend on how a user's new experimental work may relate to earlier
integrations of the CCM1.

We would like to take this opportunity to note that unusual coding
constructs, similar to those in ESTABV, exist throughout the CCM1
code. With increasing levels of compiler optimization, it is
conceivable that more of these types of code generation problems will
surface with future versions of the compiler. Consequently, we
strongly urge the user community to move to the CCM2 for their general
circulation modeling work. Long-term maintenance support for the CCM1
cannot be guaranteed should future compiler or library changes make the
CCM1 unusable in its current form.

** NOTE: Shavano and Castle, referred to in the above message, are
respectively the NCAR Cray Y-MP/864 and NCAR Y-MP/216
supercomputer systems.