An Introduction to Atmospheric and Oceanographic Datasets

1. INTRODUCTION

Data analysis is one of the foundations for research in the atmospheric and oceanographic sciences. The data are in different forms and are obtained from a variety of sources (Fig. 1.1). These datasets contain information on different spatial and temporal scales. Some common examples include:

(i) Conventional meteorological observations may be recorded on an hourly basis, at fixed internationally agreed-upon times or once per day at several types of stations. Stations which record only once per day normally record minimum and maximum daily temperatures (T_min and T_max, respectively) and precipitation, while the other station types commonly observe a wider range of variables which may include: temperature, precipitation, surface pressure, humidity, wind speed and direction, cloud cover, snow depth, visibility, solar radiation and current weather. Measurements at various levels of the atmosphere are recorded one, two or four times per day at a relatively small number of stations at internationally agreed-upon times. These ``upper air'' observations are made by radiosondes or rawinsondes. The former measure temperature and humidity while the latter measure temperature, humidity and winds as functions of pressure. These upper air observations are generically referred to as ``raob'' data regardless of which instrument was actually used. Ocean data are provided by ships and by moored and drifting buoys which measure many of the same atmospheric variables as land stations. In addition, a suite of ocean measurements may be made. These can include water temperature, salinity, dissolved oxygen, various nutrients and tracers at the surface or in vertical profiles much like atmospheric raob data. Unlike land-based observations, these ships and drifting buoys are (generally) moving and each report can come from a different geographic location.

(ii) Irregularly spaced observational data are frequently interpolated to regularly or near-regularly spaced arrays (grids) using computer-based analysis algorithms. These analyses produce a global field which is depicted by a finite number of discrete points. Some algorithms are simple while others are quite complex. Many of these gridded datasets are derived by various national meteorological forecast centers because the numerical models used to make operational weather forecasts require the initial data to have some regular form. These gridded initial conditions are a best estimate of the state of the atmosphere at a particular time. Later, these gridded datasets, sometimes called `analyzed' grids, are used by researchers to derive `diagnostic' or value-added datasets. Diagnostic datasets contain derived physical quantities (e.g., divergence, streamfunction, heat and momentum transport, Eliassen-Palm fluxes, etc. ) which may be used to further describe the atmosphere's physical and dynamical processes.

(iii) Satellites commonly provide information on a broad range of geophysical quantities such as the vertical distribution of atmospheric temperature and moisture, clouds, winds, atmospheric gases and sea surface temperatures. The instruments on satellites detect electromagnetic energy within a specific range of wavelengths which has been reflected or emitted by the atmosphere and/or earth-ocean surface. The actual atmospheric or oceanographic quantities are derived using sophisticated retrieval algorithms. These data, which by their nature, represent spatial averages, may be available for certain regions or for the entire globe depending upon a satellite's orbit. However, these data are ``asynoptic'', that is, they are not measured at fixed observing times, but may be measured at varying times.

(iv) Climate models, which may run for many hours on supercomputers, produce gridded arrays of the basic variables. Atmospheric models calculate temperature, geopotential height, humidity, winds and vertical motions at a number of different levels. Oceanographic models include ocean temperature, salinity, and horizontal and vertical motions at different depths. The data archives from the model runs are later processed to provide derived quantities. One method of using these models is to run simulations with different physical algorithms or initial conditions and compare the results to a control run. Examples include using different convection algorithms or boundary layer formulations or differing amounts of carbon dioxide.

Common Data Problems and Characteristics

No observational dataset is perfect. Conventional observational atmospheric and oceanographic datasets (Chapters 3 and 4), satellite data (Chapter 5), and analyzed datasets (Chapter 6) all have problems which are briefly discussed in subsequent chapters. Users of the data should be aware of the deficiencies within the datasets. Unfortunately, metadata (i.e., information about the data) is often either unavailable or difficult to obtain. Metadata can be critical for correctly interpreting the observations or derived results.

Atmospheric and oceanographic datasets share many characteristics. They can be very large; many span limited time periods and have limited spatial extent; missing data and outliers are common; the spatial distribution of various observational networks is uneven; and, often, time series of data are not homogeneous. The datasets contain variables which are (generally) not independent in time or space; thus, most variables should be viewed within a multivariate context. Finally, the climate system in which the variables are sampled is not in equilibrium. This is because the system includes many physical and chemical processes which act over a variety of temporal and spatial scales. For some research, the fact that the climate system is not in equilibrium is not important (e.g., studying cumulus convection). However, this fact should not be ignored when using data records which span long periods of time. As a specific example, it complicates the interpretation of the role of greenhouse gas warming.

Organizational Sources of Data

There are many organizations which archive and make available atmospheric and oceanographic datasets. In the U.S., NOAA (see Appendix D for a list of acronyms) operates national data centers whose function is to collect, archive, quality assess, and disseminate data needed for national and international environmental research programs. These centers include NCDC, NGDC and NODC. Each has a specific purpose: NCDC has a large base of conventional surface and upper air data from U.S. supervised stations and a growing archive of international station data; NODC has archives containing oceanographic data from around the world; and NGDC has a large data base which contains diverse geophysical datasets including solar variability and paleoclimate data. NESDIS contains vast archives of satellite datasets. The CDIAC archives datasets of greenhouse gases (particularly carbon dioxide and methane) and atmospheric trace gases such as chlorofluorocarbons and nitrous oxide. The NSIDC archives snow, ice, cryosphere and selected climate data. NCAR has comprehensive data archives which contain data from a number of different sources and include many different data types (see Chapter 11 and Appendix F).

It is sometimes difficult to determine what datasets are available and where they are located. NCAR's Data Support Section (DSS) can assist people trying to locate data. Also, it may be possible to use software tools (e.g., `ftp', `gopher' and the World Wide Web) to browse inventories at data centers which may contain the desired information (see Chapter 10 which discusses the Internet). A list of selected data centers which archive and disseminate atmospheric and oceanographic datasets is included in Appendix A.

In late 1994, several U.S. government organizations which are the source of various meteorological and oceanographic datasets changed names. The National Meteorological Center (NMC) became the National Centers for Environmental Prediction (NCEP) and the Climate Analysis Center (CAC) was renamed the Climate Prediction Center (CPC). In this text, both the new and old identifiers will be used interchangably because they are ubiquitous to datasets discussed herein.

This IA emphasizes archived datasets. However, environmental data including analyzed grids from NMC/NCEP (see Chapter 6) are available on a real-time basis via Unidata systems. Many universities and investigators use these systems for both instructional and research purposes. Appendix A provides information on how to find out more about this program.

Layout of Text; Sample Datasets from NCAR; Acronyms

The focus of this NCAR Instructional Aid is upon introducing atmospheric and oceanographic data to people interested in pursuing research in these fields. To that end, each of the following chapters describes some general aspects of different types of datasets. The data chapters include the following information: a brief overview of the source of the data; an overview of the spatial and temporal coverage; and, some deficiencies and strengths. A bibliography containing selected references appropriate to each chapter appears at the end of the text.

For illustrative purposes, it is useful to refer to some specific examples to indicate typical time spans or data distributions. The examples have been taken from the NCAR archives and are presented in the form of tables at the end of selected chapters. Each table contains headers which describes different characteristics of each dataset. Generically, each table may have the following entries: an NCAR ID indicates NCAR's internal dataset identifier of the form `dsnnn.n' (e.g., ds234.0); AREA or REGION indicates the primary geographic location of the data; NO. STA. indicates the approximate number of stations; PERIOD specifies the time spanned by the dataset; FREQ indicates the observations are archived on an hourly (H), daily (D) and/or monthly (M) basis; ORDER indicates time series (T) and/or ``synoptic'' (S; all observations for one time are grouped together); VAR means the variables contained in the dataset. (T-temperature, p-pressure, z-geopotential height, h-relative and/orq-specific humidity, T_d-dew point temperature, u-east/west wind speed, v-north/south wind speed, w-vertical velocity, prc-precipitation, slp/stp-sea level or station pressure, an * means many variables are recorded). VOL indicates whether the entire dataset is small (S; less than 250 megabytes [MB]), medium (M; 250-1000MB) or large (L; greater than 1000MB).

There are many acronyms and abbreviations commonly used in the atmospheric and oceanographic sciences (i.e., the proverbial `alphabet soup'). These are often perplexing to both new and veteran researchers. To help facilitate communications with colleagues, a list of some commonly used acronyms and abbreviations is provided in Appendix D.

Common Data Problems and Characteristics
Organizational Sources of Data
Layout of Text; Sample Datasets from NCAR; Acronyms

An Introduction to Atmospheric and Oceanographic Datasets