[Next] [Prev] [Contents] [Top]

An Introduction to Atmospheric and Oceanographic Datasets


10. INTERNET


Notation used in this chapter: anything in boldface is either used for emphasis or is something entered on a keypad; hopefully the two cases will not be confusing.

Modern communications and computers allow access to a wide variety of information (e.g., data, software, documentation). The methods of communication are changing rapidly. Currently, there are several commonly used methods of electronically determining where data are located and accessing and transferring data. Most universities and research facilities have access to network resources, so this is a good way of discovering public information about scientific data.

The Internet connects many universities, government labs, commercial companies and military installations. The Internet is not a monolithic organization. Rather, it is a cooperative of local networks that have agreed-upon a defined common method of communication called a protocol. Thus, the Internet is a network of networks that functionally works as one network. The protocol used for the Internet is the ``Transmission Control Protocol/Internet Protocol'' or TCP/IP. This protocol (actually a set of protocols) was developed by the U.S. Department of Defense in the early-to-mid 1970s. The TCP, which usually resides within an operating system, allows full duplex, sequenced, and non-duplicated delivery of information between individual computers. The IP is used to transmit information across the Internet.

Each local network consists of a router (most often implemented as software on a workstation) which is connected to a communication line which in turn is connected to another router and so on. The routers are sometimes called gateway machines because they allow access to both a local network and other networks which use different protocols.

In its initial phases, NCAR was an important link within the Internet. It was part of the National Science Foundation's NSFNET. This network was originally developed to allow NSF's supercomputer centers to communicate with each other. In 1995 the Internet was transferred from NSF to private operators.

The TCP/IP set of protocols supports several features including, remote logins, file transfers, and electronic mail. Some TCP IP applications include:

Electronic Mail (e-mail): Electronic mail allows communications between individuals on different computers. It can also be used to transfer small amounts of data or software. This method of communication facilitates dialog between individuals at remote sites (and even with people in the next office!).

Telnet: Telnet is a program, and a protocol, which allows the user to log on to a remote system. In general, remote hosts require the user to have an account and a password in order to log in. Some services, though, allow connection through a general user id (or even no user id); in this case, a password is usually not required, and the user is put into a restricted shell from which only certain commands are available (usually via a menu system). To connect to another machine, simply type telnet followed by the name of the host. For example, to connect to the USGS telnet service, you would type telnet glis.cr.usgs.gov

File Transfer Protocol: The File Transfer Protocol (FTP or ftp) is the most ubiquitous method of inter-network data transfer. Almost all systems support this method. FTP is the primary utility for fast, reliable file transfers between computers on the Internet. Most systems which have public information available support ``anonymous FTP''. This means that no secret password is required to access this information. To access a computer using anonymous FTP, at your system prompt type ftp followed by the Internet name or address of the desired system. For example, to connect to the computer address called ncardata.ucar.edu which contains information on datasets available at NCAR, you would type ftp ncardata.ucar.edu use anonymous as your login and your email address as the password (if requested). Type help to get a list of commands which you can use to search files and obtain information. On some systems you must enter help followed by a command to get specific information.

Not all FTP systems accept the same commands, but a list of some useful commands includes:

ls list files in the current directory
cd change directory, e.g., cd wx changes to the wx directory
binary sets transfer mode to be appropriate for binary files
ascii sets transfer mode to be appropriate for text/ascii files (the default)
get retrieves a file, e.g., get readme gets a file called readme
bye exits FTP
Some sites are configured to prohibit outgoing FTP connections for security reasons. Users at these sites can make use of the FTP-by-mail servers which are available:

ftpmail@decwrl.dec.com North America
ftpmail@src.doc.ic.ac.uk United Kingdom
ftpmail@cs.uow.edu.au Australia
ftpmail@grasp.insa-lyon.fr France
ftpmail@ftp.uni-stuttgart.de Germany

Send an e-mail message to the closest address, with the lines:

reply your_address@some.where with your email address
connect ncardata.ucar.edu for example
cd datasets/ds111.2/software again - for example
get access_sun.f the name of the file you want
quit  

For complete instructions, send a one-line message reading help to the server .

Archie: The ``archie'' service is invoked by entering archie or, better yet, xarchie if you are on an X-Window system. It can be used for searching all FTP sites for filenames and directories matching a specified search string. Users should be aware that archie can put large demands on the network when many searches are requested.

Gopher: Using ftp requires users to know where they can access the data; they must connect to a particular Internet address, and then navigate through subdirectories. Also, if many remote users are examining or browsing a data base, a computer server can be overwhelmed. Gopher is a distributed information system developed at the University of Minnesota that allows efficient access to databases that are stored on computers all over the world. Gopher does not tie up a remote machine because it is connected only long enough to access desired information, and it is easy to use because it is strictly menu driven.

Gopher is a program executed by entering gopher or xgopher (if on a system running X-windows). It can be used to retrieve files from Gopher servers anywhere on the Internet. Gopher servers store files containing text or binary data, as well as links to other Gopher servers and gateways to other information systems and network services such as FTP.

The Gopher client presents information to the user as a series of nested menus. Some menu items may lead to files, and some may lead to other menus. The ``Veronica'' service allows users to search all Gopher servers for file descriptions which contain a specified keyword.

Public domain Gopher clients are available for many computers and operating systems. If one is not already installed on your system, you can obtain a client by anonymous FTP to boombox.micro.umn.edu in the directory /pub/gopher. You can also use a remote Gopher client via a telnet session to a remote host such as consultant.micro.umn.edu or gopher.uiuc.edu (login as gopher unless otherwise specified).

World-Wide Web (WWW) : The ``Web'' is a multimedia hypertext-based information system. ``Hypertext-based'' means that any word or image in a document can be specified as a pointer (link) to another document where more information can be found. The reader can open the second document by selecting the link.

WWW servers make use of the Hypertext Transport Protocol (HTTP or http) and are sometimes referred to as ``http servers''. As with Gopher, the documents can be distributed across many remote systems with different protocols. The reader need not know where the referenced documents are located, nor what protocol is necessary to access them. WWW's advantage over Gopher is that documents may be multimedia; containing formatted text as well as images, sounds, and movies.

To access the WWW, the user must run a client program, often called a browser. WWW browsers can access many different types of sites, including FTP, Gopher, Telnet as well as HTTP; because of this, the WWW is rapidly becoming the major means of access to Internet resources. The most popular WWW clients are NCSA Mosaic and Netscape. Both Mosaic and Netscape run under a variety of systems. Microsoft Windows and Macintosh provide point-and-click interfaces for each. You can retrieve copies of NCSA Mosaic in both source and executable binary form from NCSA's anonymous FTP server, ftp.ncsa.uiuc.edu. Netscape is available by ftp from ftp.netscape.com.

Other graphical Web browsers are available for most (if not all) platforms. There are line-mode and VT100 browsers available for terminals without graphic capability. If one is not already installed on your system, you can obtain information about clients by anonymous FTP to info.cern.ch in the /pub/www/src directory. You can also use a remote WWW client via a telnet session (login as www ) to the nearest of several hosts :

fatty.law.cornell.edu Eastern North America
www.njit.edu Eastern North America
www.cc.ukans.edu Central North America
www.huji.ac.il Israel
info.funet.f Finland
info.cern.ch Switzerland

Initially, the user is connected to a ``home page''. When the user clicks on any highlighted word, phrase, or image, Mosaic accesses the selected document. For example, someone using Mosaic to look at NCAR's DSS catalog might click on the name of a dataset to get the desired information. Mosaic allows full exploitation of many of the features of the WWW, such as interactive forms and maps, so it is very popular; in fact, WWW servers are sometimes (incorrectly) referred to as ``Mosaic servers''. The documents being viewed have an address called a Uniform Resource Locator ( URL ) that is the WWW equivalent of a filename that includes information about the server. Generally, the URLs can be built based upon a knowledge of the site/server and the filename of the document as shown below:

ftp://host.name.domain/directory/[filename] ftp site
http://host.name.domain/directory/[filename] www server
telnet://host.name.domain telnet site
gopher://host.name.domain gopher server
wais://host.name.domain wais server
news:newsgroup.name newsgroup

For example, if a document is available at ftp://nic.fb4.noaa.gov/pub/ it means that you could type (assuming you are on a UNIX system) ftp nic.fb4.noaa.gov then log in as anonymous answer the password prompt with your email address, cd pub then ls just to see what is in the directory, much less look at any document. If you are using a WWW browser, you could find the Open URL menu, enter ftp://nic.fb4.noaa.gov/pub/ and the contents of that directory will be listed, usually with icons indicating the type of file (text, directory etc. ). The files can also be ``browsed'' in place merely by clicking on the link. This is a significant advantage over FTP. This strategy also works with numeric addresses ( e.g., ftp://192.67.134.72/; this is the anonymous FTP area for the NCDC). More information on building URLs is generally located under the Help menu in your browser.

Some FTP sites have hypertext documents and effectively ``simulate'' http servers; opening these documents with a browser allows you to use the full benefits of the multimedia nature of the WWW.

The bottom line is that the more you poke around on the WWW, the more you find. Please do not abuse the WWW by accessing non-work-related images; this slows down the system for all of us.

Finding Documents and Data on the WWW:

If you know the URL of a resource, you can enter it in your browser (usually through the "Open" menu or command); you can also save that location in your hotlist or bookmark list and return to it easily.

If you do not know a specific URL, there are many resource lists and search engines. A good general Internet directory is Yahoo at http://www.yahoo.com. If you know the geographic region or location of the resource you are looking for, you can try the CERN directory of servers at http://www.w3.org/hypertext/DataSources/WWW/Servers.html.

The Yahoo directory contains a search option, which can find keywords among all its listings. There are also several "web robots" or "web walkers" which periodically index all documents they can find across the entire WWW, including documents accessed by the ftp and gopher protocols as well as by http. These indices are used by `search engines' such as Webcrawler, at http://webcrawler.com , and Lycos, at http://lycos.cs.cmu.edu/.

There are also many subject-specific resource lists, including lists specific to meteorological data. The best place to start is probably the Frequently Asked Questions (FAQ) lists for the Usenet newsgroups sci.geo.meteorology and sci.data.formats.

Meteorology Frequently Asked Questions

A very large list of meteorological and oceanographic data sources is available on the net as part of the Meteorology Frequently Asked Questions (FAQ). There are seven parts to the FAQ; research data is covered in part 3, ("Meteorology FAQ Part 3/7: Sources of research data"), but all sections of the FAQ may be useful. The FAQ is posted to the Usenet newsgroups sci.geo.meteorology, news.answers , and sci.answers every two weeks.

The FAQ can be obtained by anonymous ftp to ncardata.ucar.edu in the directory other or by using a WWW browser to open the URL http://www.scd.ucar.edu/dss/faq/index.html for a hypertext version.

Scientific Data Format Frequently Asked Questions :

More information about scientific data formats may be found in the current copy of the Scientific Data Formats FAQ . This can be obtained by anonymous FTP to rtfm.mit.edu and is called /pub/usenet/news.answers/sci-data-formats. A hypertext version can be found in the following FAQ archives: http://www.lib.ox.ac.uk/internet/news/faq/by http://www.cs.ruu.nl/wais/html/na-dir/.html http://www.smartpages.com/faqs/top.html http://www.cis.ohio-state.edu/hypertext/faq/usenet/top.html.

This FAQ is also updated every two weeks, with copies posted to sci.data.formats, sci.answers and news.answers .

More information about the Internet, FTP, Telnet, etc. may be found in either of the above-mentioned FAQs.

Finding Documents and Data on the WWW:
Meteorology Frequently Asked Questions
Scientific Data Format Frequently Asked Questions :

An Introduction to Atmospheric and Oceanographic Datasets
[Next] [Prev] [Contents] [Top]