How does the World Wide Web Work?

How Does the World Wide Web Work?

The World Wide Web (www) is one of four important Internet application software tools. The others are e-mail, FTP, and Telnet. The World Wide Web is one of the fastest-growing Internet software applications. It was first conceived in 1989. The original idea was to develop a database of information on physics research. But the researcher found it difficult to fit the information into a traditional database. Instead, he decided to use a hypertext network of information. With hypertext, any document can contain a link to any other document.

The first Web browser was written in 1990, but it was 1991 before it was available on the Internet for other organizations to use. By the end of 1992, several browsers had been written for UNIX computers, and there were about 30 Web servers in the entire world. A Web server is a computer on which the web pages are stored.

Mosaic was the first graphical Web browser. By the end of 1993, the Mosaic browser was available for UNIX, Windows, and Macintosh computers. There were now about 200 Web servers in the world. In 1994 Netscape and a half a dozen other startup companies introduced commercial Web browsers. Within a year, it had become clear that the Web had changed the face of computer. The development of Mosaic stopped in 1996 as Netscape and Microsoft began to invest millions to improve their browsers. So if you've only heard about the Internet or the Web in just the last few years, there is a reason.

In order for the Web to work, each client computer (that's the computer you are on) needs a software package called a Web browser. Netscape Navigator and Microsoft Internet Explorer are the most popular browsers. Each server on the network (that's the computer that has the information you want to look at -- it's left on day and night) needs a software package called a Web server. There are thousands of Web servers.

Each computer on the Internet is identified by a number, rather than by a name. Numbers are easier for the computers to deal with. This number is called their Internet Protocol (IP) address, and is used to differentiate one computer from another. An IP number comprises a four-part format, with each part of the address becoming increasingly machine-specific. The first part of the number identifies the geographic region, the second part specifies the organization or provider, the third part denotes a group of computers, and the fourth part denotes the actual machine itself.

Unfortunately, humans deal better with names. They are easier for us to remember. For example, you would rather type in www.pksoln.com than 216.180.108.134, although the numbers would work just as well. The names of the computers are called the domain names. Whenever you specify a domain name in a Web session, the session doesn't actually begin until the domain name is translated into its IP address. This translation is the task of a Domain Name System (DNS) server or, as is more often the case, a series of DNS servers, in which the first queries the next until the correct IP number is acquired.

DNS servers around the world have to be made aware of changes as quickly as possible. Before DNS servers came along, domain name translation depended entirely on the host table. The host table listed, line by line, Internet host names and their associated IP numbers. The master host table is compiled and stored on the machines at the Network Information Center (NIC). As domain names are added constantly, it's impractical for every host on the Internet to keep acquiring this file for its users.

The solution was the DNS server system. Unlike the host table, DNS servers don't rely on one large mapping file. Instead, DNS servers contain only a limited amount of information, and then ask the next server up the chain for domain names they don't know about. If the contacted server doesn't contain information for that domain name, it asks the next server higher up the chain, forming a series of queries that continues until the information is found. In practice, this means that the request can be handled by any number of servers, and that this sort of back-and-forth activity happens all day, every day on the constantly changing Internet. The server that originally made the request will cache the information to satisfy future requests without the need to go to another server. This information is set by the DNS server administrator to time-out after a specified period, to avoid the problem of fulfilling name requests with old data.

Thus, in order to get a page from the Web, the user must first type the Internet URL (Uniform Resource Locator) for the page into their Web browser, or click on a link that provides the URL. The URL specifies the Internet address of the Web server, the directory, and name of the specific page wanted. If no directory or page is specified, the Web server will provide whatever page has been defined as its home page. If no server name is specified, the Web browser will presume the address is on the same server and directory as the last request.

[ Back to Discovering the Net ]