CSA402
Lecture 12 - Hypermedia and the World-Wide Web: Part 1
References:
Steinmetz, R., and Nahrstedt, K. (1995). Multimedia:
Computing, Communications & Applications. Prentice Hall. Chapter 15.
Introduction
Hypermedia is probably the most popular, but certainly the most widely
used, distributed multimedia system. The philosophy behind hypermedia is
simple: in a computer-based environment, enable users to immediately and
seamlessly access documents which are presented through a common interface, regardless of the documents' location and
type.
Hypertext has a long and varied history, largely in the domain of research.
It was only when Mosaic was designed as the first graphical user interface to
a distributed hypertext system developed at CERN that popular hypertext
exploded. The system developed at CERN has now become the World-Wide Web
(WWW), and Web browsers, like Microsoft Internet Explorer and Netscape
Navigator/Communicator, are present on virtually all computer systems.
This lecture assumes familiarity with the Web, and a basic understanding of
what hypermedia is. We will see just how the Web is constructed, by first
explaining how the Internet provides services, and the taking a closer look
at the HyperText Transfer Protocol (HTTP), and the HyperText Markup
Language (HTML). Finally, we will compare the characteristics of the
TCP/IP protocol, the Internet's underlying protocol, with the requirements
of distributed multimedia systems.
Hypertext
In hypertext, individual documents can be referred to within another
document, and support is provided for users to perform a simple action to
immediately access the referred to document.
This simple statement hides a considerable number of implementation
factors:
- in order to support distributed hypertext, computers need to be
interconnected
- documents need to be uniquely identified in a distibuted environment, and
the method of naming files needs to be supported in a distributed fashion
- hypermedia implies that documents of any type (continuous or discrete) can be
referred to: however, the user interface should provide a standard method
of interaction
- when a document contains a reference to another document, users should be
able to perform a simple action to access the document
- references to other documents can be made from any document type
- the distributed hypertext system needs to be an open system, so
that any computer running any operating system can access or serve documents
- usually, but not always, hypermedia can link to live data,
although it is rare to link from live data
As we shall see, these requirements are provided by a number of different
layers in the communication, operating, and application systems.
The Internet
The basis for communication between computers is provided by the Internet.
The internet progressively joins local area networks allowing a
variety of otherwise incompatible computers to communicate. The basic
protocol which makes this possible is the Internet Protocol (IP). IP is
predominantly an addressing and data routing scheme. Some computers on a
network will act as routers. Routers typically know the identity of the
computers on their local network, and know the address of another router
to which data addressed to an unknown computer should be sent. At the
level of IP, data is simply routed from a source
computer to a destination computer through a series of 0 routers (if the
destination computer is on the same LAN as the source computer) or more.
IP processes relatviely small chunks of data. A single file may be
decomposed into smaller parts (datagrams) before it is transmitted over the Internet.
Individual routers will decide at the time they receive a datagram, to
which router they will forward the datagram. It is possible, therefore,
that the individual datagrams comprising the original file will take
different routes before they arrive at the destination. For this reason,
it is possible for datagrams to be received out of sequence. Additionally, data
may get corrupted or even lost. This is where TCP (Transmission Control Protocol)
comes in.
IP is solely responsible for taking datagrams and delivering them to their
destination over any available route. TCP is responsible for preparing the
datagrams for transmission, ensuring that the communication is
reliable, and re-assembling the datagrams received at the destination.
When, for example, a file is being prepared for transfer, TCP will divide
the file into datagrams (the size of which is negotiated with the
destination computer). TCP will add the source and destination computer
addresses (IP addresses), a datagram sequence number, and error detection data (e.g., a
checksum), amongst other things, to each datagram. TCP then hands each
datagram to IP which attempts to deliver them. When datagrams arrive at the
destination computer, TCP attempts to re-assemble the original data,
using the sequence numbers. At this point it may notice that a datagram has been
corrupted during transfer. TCP at the destination will simply discard any
bad datagrams. However, whenever TCP at the destination encouters a valid
datagram it will send an acknowledgement back to the source computer. If
the source computer does not receive an acknowledgement within some time-out
period, it re-transmits the datagram.
Armed with a method of reliably transferring data between any two computers
anywhere in the world, it becomes possible to offer a range of services.
Traditionally, Internet services were based around telnet (remote login),
ftp (file transfer), and electronic mail. These soon expanded to include
network file systems, remote printing, and remote execution of programs.
An important service built on top of TCP/IP which abstracts away of the
specific binding of computers to an IP address is provided by Domain Name
Servers (DNS). In this scheme, computers are known by a domain name (e.g.,
saturn.cs.um.edu.mt) and a domain name server can be queried to return the
actual IP address of the computer known by its domain name. Services can
now be mobile (in the case of a server being relocated, either onto another
computer within the same LAN, or onto another LAN entirely). Although the
computer offering the service has a different IP address, it will still be
known by its domain name. For example, in the Department of Computer
Science and AI, Univeristy of Malta, mail services are currently provided
by the machine with the IP address 193.188.34.1 . The computer is also
known as zeus.cs.um.edu.mt. The mail service, however, is known as
mail.cs.um.edu.mt. As far as DNS is concerned, the mail service is
provided by mail.cs.um.edu.mt. It is possible to seamlessly relocate the
mail service to any other computer, or even attach zeus.cs.um.edu.mt to
any other LAN. In both cases, this will result in the IP address of the
mail server changing. However, all that is required is to update the IP
address of the mail server in the DNS file, so that all data directed to
mail.cs.um.edu.mt will be sent to the correct computer providing the mail
service. DNS entries must be unique, just as IP address must be unique.
Otherwise, if there are two computers both known as mail.cs.um.edu.mt (or
193.188.34.1), then routers will be unable to determine to which computer
datagrams should be directed. Each country in the world has a Network
Information Centre (NIC) associated with it. The NICs are responsible for
coordinating IP address usage, to ensure that IP addresses are used
efficiently (as there are a finite number available) and through which
domain names can be registered.
Internet Services and Ports
Internet services are provided by running appropriate server software on a
computer connected to the Internet. For example, to run an FTP service, an
FTP server is run on a computer connected to the Internet. Any other
computer can then download files from this server by connecting to it
using an FTP client.
It is possible to run several services from the same computer. Server
applications then have the task of determining to which service the
various datagrams are addressed, given that many simultaneous conversations
with different servers may be in progress. This is achieved by the application layer.
The application layer sits on top of TCP/IP and "listens" to a
communication channel called a port. Each server listens on a
different private port. When client software sends data to a particular
server, it needs to be addressed to the appropriate port. The server will
know that data is for its attention when data arrives at the port to which
it is listening. Port numbers are "well known" or "assigned" - for example, connecting
(using telnet, for example) to port 25 on any machine will, if there is a
mail server listening on that port, result in a connection being
constructed for a mail session. Each conversation has 4 numbers - two
identifying the client and server computers by their IP addresses, one for
the port on the server on which the conversation is taking place, and
finally the port on the client computer which initiated the conversation.
It is possible for the same client to be running two FTP (or any other
service) sessions with the same server concurrently. In this case, the
client will use a different port number for each session so that the
separate conversations do not become garbled. Application layers have their
own protocol for controlling the conversation. In the case of file
transfer, it is FTP. Mail services use the Simple Mail Transfer Protocol
(SMTP), etc.
Back to the index for this course.
In case of any difficulties or for further information e-mail
[email protected]
Date last amended: Monday, 22 March, 1999