(i) Conventional meteorological observations may be recorded on an hourly basis, at fixed internationally agreed-upon times or once per day at several types of stations. Stations which record only once per day normally record minimum and maximum daily temperatures (Tmin and Tmax, respectively) and precipitation, while the other station types commonly observe a wider range of variables which may include: temperature, precipitation, surface pressure, humidity, wind speed and direction, cloud cover, snow depth, visibility, solar radiation and current weather. Measurements at various levels of the atmosphere are recorded one, two or four times per day at a relatively small number of stations at internationally agreed-upon times. These ``upper air'' observations are made by radiosondes or rawinsondes. The former measure temperature and humidity while the latter measure temperature, humidity and winds as functions of pressure. These upper air observations are generically referred to as ``raob'' data regardless of which instrument was actually used. Ocean data are provided by ships and by moored and drifting buoys which measure many of the same atmospheric variables as land stations. In addition, a suite of ocean measurements may be made. These can include water temperature, salinity, dissolved oxygen, various nutrients and tracers at the surface or in vertical profiles much like atmospheric raob data. Unlike land-based observations, these ships and drifting buoys are (generally) moving and each report can come from a different geographic location.
(ii) Irregularly spaced observational data are frequently interpolated to regularly or near-regularly spaced arrays (grids) using computer-based analysis algorithms. These analyses produce a global field which is depicted by a finite number of discrete points. Some algorithms are simple while others are quite complex. Many of these gridded datasets are derived by various national meteorological forecast centers because the numerical models used to make operational weather forecasts require the initial data to have some regular form. These gridded initial conditions are a best estimate of the state of the atmosphere at a particular time. Later, these gridded datasets, sometimes called `analyzed' grids, are used by researchers to derive `diagnostic' or value-added datasets. Diagnostic datasets contain derived physical quantities (e.g., divergence, streamfunction, heat and momentum transport, Eliassen-Palm fluxes, etc. ) which may be used to further describe the atmosphere's physical and dynamical processes.
(iii) Satellites commonly provide information on a broad range of geophysical quantities such as the vertical distribution of atmospheric temperature and moisture, clouds, winds, atmospheric gases and sea surface temperatures. The instruments on satellites detect electromagnetic energy within a specific range of wavelengths which has been reflected or emitted by the atmosphere and/or earth-ocean surface. The actual atmospheric or oceanographic quantities are derived using sophisticated retrieval algorithms. These data, which by their nature, represent spatial averages, may be available for certain regions or for the entire globe depending upon a satellite's orbit. However, these data are ``asynoptic'', that is, they are not measured at fixed observing times, but may be measured at varying times.
(iv) Climate models, which may run for many hours on supercomputers, produce gridded arrays of the basic variables. Atmospheric models calculate temperature, geopotential height, humidity, winds and vertical motions at a number of different levels. Oceanographic models include ocean temperature, salinity, and horizontal and vertical motions at different depths. The data archives from the model runs are later processed to provide derived quantities. One method of using these models is to run simulations with different physical algorithms or initial conditions and compare the results to a control run. Examples include using different convection algorithms or boundary layer formulations or differing amounts of carbon dioxide.
Common Data Problems and Characteristics
No observational dataset is perfect. Conventional observational
atmospheric and oceanographic datasets (Chapters 3 and 4), satellite
data (Chapter 5), and analyzed datasets (Chapter 6) all have problems
which are briefly discussed in subsequent chapters. Users of the data
should be aware of the deficiencies within the datasets.
Unfortunately, metadata (i.e., information about the data) is often
either unavailable or difficult to obtain. Metadata can be critical
for correctly interpreting the observations or derived results.
Atmospheric and oceanographic datasets share many characteristics. They can be very large; many span limited time periods and have limited spatial extent; missing data and outliers are common; the spatial distribution of various observational networks is uneven; and, often, time series of data are not homogeneous. The datasets contain variables which are (generally) not independent in time or space; thus, most variables should be viewed within a multivariate context. Finally, the climate system in which the variables are sampled is not in equilibrium. This is because the system includes many physical and chemical processes which act over a variety of temporal and spatial scales. For some research, the fact that the climate system is not in equilibrium is not important (e.g., studying cumulus convection). However, this fact should not be ignored when using data records which span long periods of time. As a specific example, it complicates the interpretation of the role of greenhouse gas warming.
Organizational Sources of Data
There are many organizations which archive and make available
atmospheric and oceanographic datasets. In the U.S., NOAA (see
Appendix D for a list of acronyms) operates national data centers
whose function is to collect, archive, quality assess, and disseminate
data needed for national and international environmental research
programs. These centers include NCDC, NGDC and NODC. Each has a
specific purpose: NCDC has a large base of conventional surface and
upper air data from U.S. supervised stations and a growing archive of
international station data; NODC has archives containing oceanographic
data from around the world; and NGDC has a large data base which
contains diverse geophysical datasets including solar variability and
paleoclimate data. NESDIS contains vast archives of satellite
datasets. The CDIAC archives datasets of greenhouse gases
(particularly carbon dioxide and methane) and atmospheric trace gases
such as chlorofluorocarbons and nitrous oxide. The NSIDC archives
snow, ice, cryosphere and selected climate data. NCAR has
comprehensive data archives which contain data from a number of
different sources and include many different data types (see
Chapter 11 and Appendix F).
It is sometimes difficult to determine what datasets are available and where they are located. NCAR's Data Support Section (DSS) can assist people trying to locate data. Also, it may be possible to use software tools (e.g., `ftp', `gopher' and the World Wide Web) to browse inventories at data centers which may contain the desired information (see Chapter 10 which discusses the Internet). A list of selected data centers which archive and disseminate atmospheric and oceanographic datasets is included in Appendix A.
In late 1994, several U.S. government organizations which are the source of various meteorological and oceanographic datasets changed names. The National Meteorological Center (NMC) became the National Centers for Environmental Prediction (NCEP) and the Climate Analysis Center (CAC) was renamed the Climate Prediction Center (CPC). In this text, both the new and old identifiers will be used interchangably because they are ubiquitous to datasets discussed herein.
This IA emphasizes archived datasets. However, environmental data including analyzed grids from NMC/NCEP (see Chapter 6) are available on a real-time basis via Unidata systems. Many universities and investigators use these systems for both instructional and research purposes. Appendix A provides information on how to find out more about this program.
For illustrative purposes, it is useful to refer to some specific
examples to indicate typical time spans or data distributions. The
examples have been taken from the NCAR archives and are presented in
the form of tables at the end of selected chapters. Each table
contains headers which describes different characteristics of each
dataset. Generically, each table may have the following entries: an
NCAR ID indicates NCAR's internal dataset identifier of the form
`dsnnn.n' (e.g., ds234.0); AREA or REGION indicates the primary
geographic location of the data; NO. STA. indicates the approximate
number of stations; PERIOD specifies the time spanned by the dataset;
FREQ indicates the observations are archived on an hourly (H), daily (D)
and/or monthly (M) basis; ORDER indicates time series (T) and/or
``synoptic'' (S; all observations for one time are grouped together);
VAR means the variables contained in the dataset. (T-temperature,
p-pressure, z-geopotential height, h-relative and/orq-specific
humidity, Td-dew point temperature, u-east/west wind speed,
v-north/south wind speed, w-vertical velocity, prc-precipitation,
slp/stp-sea level or station pressure, an * means many variables are
recorded). VOL indicates whether the entire dataset is small (S; less
than 250 megabytes [MB]), medium (M; 250-1000MB) or large (L; greater
than 1000MB).
There are many acronyms and abbreviations commonly used in the
atmospheric and oceanographic sciences (i.e., the proverbial `alphabet
soup'). These are often perplexing to both new and veteran
researchers. To help facilitate communications with colleagues, a
list of some commonly used acronyms and abbreviations is provided in
Appendix D.
Layout of Text; Sample Datasets from NCAR; Acronyms
The focus of this NCAR Instructional Aid is upon introducing
atmospheric and oceanographic data to people interested in pursuing
research in these fields. To that end, each of the following chapters
describes some general aspects of different types of datasets. The
data chapters include the following information: a brief overview of
the source of the data; an overview of the spatial and temporal
coverage; and, some deficiencies and strengths. A bibliography
containing selected references appropriate to each chapter appears at
the end of the text.
An Introduction to Atmospheric and Oceanographic Datasets