CSA402
Lecture 13 - Hypermedia and the World-Wide Web: Part 2
References:
Steinmetz, R., and Nahrstedt, K. (1995). Multimedia:
Computing, Communications & Applications. Prentice Hall. Chapter 15.
WWW
The World-Wide Web is an Internet service. Consequently, it is implemented
on top of TCP/IP and uses the HyperText Transfer Protocol (HTTP). Usually,
HTTP servers listen on port 80. HTTP, like all other internet services, assumes
that TCP/IP will correctly transfer data between computers.
In HTTP, documents are uniquely identified using a Uniform Resource
Identifier (URI), usually in the form of a Uniform Resource Locator
(URL).
The HTTP protocol is based on a request/response paradigm. A client establishes a connection
with a server and sends a request to the server
in the form of a request method, URI, and protocol version, followed by
a message containing request modifiers, client information,
and possible body content. The server responds with a status line,
including the message's protocol version and a success or error code,
followed by a message containing server information, entity
metainformation (e.g., document accounting information, like "date last
modified"), and possibly body content.
The URL is the generic document naming strategy for HTTP. It usually takes
the form of
protocol://host [:port] [absolute_pathname] [#fragment]
where protocol is the protocol to use, e.g., http,
host is the domain name or IP address of the HTTP server hosting
the document, e.g., www.cs.um.edu.mt, port is the port on
which the HTTP server is listening - the default is 80,
absolute_pathname is the fully qualified path of the document
from the HTTP server's document root, and fragment is a symbolic offset
within the document. If the absolute_pathname is omitted, then the
HTTP server refers to a default document.
You will have noticed that the http protocol is, in fact, not the only
protocol that can be referred to in the URL. An HTTP client can be designed
to accept any Internet protocol (e.g, ftp, gopher), in which case it will
interact with the appropriate application layer on the server-side. HTTP
clients (Web browsers such as Microsoft Internet Explorer and Netscape
Communicator) can offer a consistent user interface to many Internet
services.
HTTP requires that a telnet connection is made by the client to the HTTP
server (and appropriate port), prior to the request for a document being sent.
The request of the server can then be made. Normally, once the server has
satisfied the request, the server will terminate the connection. However,
either the client or the server can abnormally terminate the connection
(if, for instance, the user interrupts the download of a document, or if
the server crashes). The client must be resilient enough to recognise an
abnormal termination of the connection.
An HTTP request generally takes the form of
<request_type> <URL> <HTTP_version_number> CRLF
For example, GET http://www.cs.um.edu.mt/~cstaff/index.html
HTTP/1.0. The server will respond with a message that includes the
status (whether the document was found/not found/relocated/etc.), document
metainformation, and the document body (if the document was found). For
example,
[24] zeus telnet www.cs.um.edu.mt 80 -- connect to HTTP server on port 80
Trying 193.188.34.81...
Connected to babe.cs.um.edu.mt.
Escape character is '^]'.
GET http://www.cs.um.edu.mt/~cstaff/index.html HTTP/1.0 -- HTTP request
HTTP/1.1 200 OK -- status line, 200 indicates document exists and can be downloaded
Date: Tue, 16 Mar 1999 10:01:31 GMT -- date document requested
Server: Apache/1.2.1 -- identity of HTTP server
Last-Modified: Wed, 27 Jan 1999 15:25:47 GMT -- date document last modified
Content-Length: 2319 -- in bytes
Accept-Ranges: bytes
Connection: close
Content-Type: text/html -- document type
The "Never to be Completed" Site
Chris Staff's not at home, Page
Well, you've stumbled across my site. I'm a firm believer that 99% of all
Web sites should be under construction, so I'm not even going to bother
putting
graphics of construction workers up on this page, because they'll never be
removed!
One day this page will be neatly laid out, but it isn't going to happen
for the foreseeable future.
A little bit about me
I lecture in Computer Science in
the Dept. of Computer Science and
A.I. at the University of Malta.
I'm also reading a PhD in Adaptive Hypertext at
The School of Cognitive and Computing
Sciences, University of Sussex. I'm
about half way through at the moment (March 1997).
The courses I teach
You can follow the links to course notes, where they're on-line
CSM202:
Operating Systems.
CSM210:
Systems Programming in C (Part I).
CSA402
: Graphics
and Multimedia Systems: Multimedia Systems.
These lectures
form part of the University's BSc IT (Hons.) degree. The Web
pages for the degree also give information like course descriptions, when
lectures are scheduled for delivery, how many credits they're worth,
etc.
I also service some other lecture courses.
Practical I.T. for Human Resource
Development, for the MA in Human Resource Development.
Marketing on the WWW, for the MA in Marketing.
Contact Details
E-mail: cstaff@cs.um.edu.mt
Postal Address:
University of Malta, Dept. of Computer Science
and A.I., Tal-Qroqq, Msida MSD 06, Malta, Europe
Location on Campus: Room 402, New Computer Building (off Car Park 5)
Telephone: (356)-32902506
Fax: (356)-320539
Connection closed by foreign host. -- server closes connection
[25] zeus
The content-type field is an important part of the Web. The Web
supports of a variety of multimedia types. The type is used to indicate to
the Web client how the data being downloaded should be handled. For example,
GIF images, have the content type image/gif, so that the Web
client can handle the stream as a possibly compressed image. Other types
might require the loading of an application in which to display the data
(e.g., video, or a Microsoft Word document). The request_type can
also be HEAD, in which case only document metainformation is
downloaded by the server. This is particularly useful to see if the
document has been changed since the last time it was downloaded. In order
to make efficient use of the Internet, Web clients use a local cache, in
which recently downloaded documents are stored. If the document is still in
the cache and hasn't changed since it was cached, then the document is
loaded from the cache instead. Consequently, unless directed otherwise by
the user, GET requests are typically preceded by a HEAD
request.
The request typically originates within Web client application. A user will be
browsing through a document, using a Web browser, and will click on text which is marked up as
being a "link" to another multimedia document. The majority of documents on
the Web are so-called HTML documents, documents which contain instructions
which the Web browser can use to interpret the manner in which the
document is to be displayed to the user. HTML, the HyperText Markup
Language, is a presentation language which is used to indicate the
context of text. Text is modified in accordance with recognised
tags (which are composed of a < and > surrounding a
modifier). For example, the following text is interpreted as being
emboldened by the presence of the <b> and </b> tags
surrounding it. The </b> indicates that emboldening ends at that
point. <b>This text is emboldened</b>. Typically, a tag
is a pair <tag>text</tag>. The majority of tags are used to enable a browser to modify how
the content of the document is displayed to the user, according to user
preferences. The glue that links documents to each other is the HTML
Hypertext reference tag, <A
HREF="URL">anchor_text</A>, where
the anchor_text is the text in the document, which, when clicked
on, results in the generation of an HTTP request to display the document
identified by the URL.
When a user clicks on a link, the application will generate a request for
the associated document to be displayed. Typically, this involves
downloading the document to the client's computer and processing it
locally. The client will set up a communications channel by telneting to
the server's port identified in the URL. If the connection is successful,
and if the document is already in the client's cache, the application may generate
a HEAD request to check the modification date of the document. If the
document has not been modified since it was last cached, then the client
will reuse the version in the cache, otherwise, or if the document is not
already in the cache, the application will generate a GET request
(possibly first having to reconnect to the server using telnet). The
client waits until it receives a response from the server, or until the
request times out. It then processes the response, or reports a time-out
error.
Deficiencies of TCP/IP
The deficiencies of TCP/IP should be fairly obvious, given what you
already know about multimedia and the brief description of TCP/IP given above.
However, for the sake of completeness, they are given below.
TCP/IP is perfectly suited to the requirements of discrete data, but the
inherent properties of TCP/IP means that it is not suitable for continuous
media types.
The overriding factor in IP is that datagrams are transported to the sink
over the fastest available route. This means that datagrams can be
delivered out of sequence. It also means that QoS parameters are severely
restricted, as no minimum of maximum delivery times can be specified. Also,
round-trip delays are subject to network loads, and service can
deteriorate rapidly, requiring the sink and source to renegotiate QoS
parameters frequently during a dialogue.
TCP is primarily concerned with ensuring that datagrams have been received
correctly. If the source has not had an acknowledgement that the datagram
has not been received, then it re-sends it. For video and audio broadcasts, this is
wasteful of resources as more often than not, the client application will
have missed the deadline for processing the missing data.
If many sinks are receiving the same broadcast, TCP/IP at the source is
responsible for generating as many individually addressed datagrams as are
necessary. This implies that the sinks may not be receiving the data in
synchrony (it may take a variable amount of time for each datagram to be generated and routed to
the individual sinks), and that the broadcast is contributing to network
congestion. The TCP/IP network is unable to give any guarantees over and
above the guarantee that as long as there is a route between the computers
involved in the multimedia session, then data will eventually be routed
between the source(s) and the sink(s). It is up to the application layers
to provide additional guarantees, in so far as they are supported by TCP/IP.
For example, in the case of real time video and audio playback or telephony over the
Internet, servers and clients can negotiate on the frame rate
and buffer size, taking into account the current round-trip delay. If the
client experiences a degradation or an improvement of service, it can
renegotiate the frame-rate.
The Multicast Backbone - MBONE
The Multicast Backbone was designed specifically to overcome some of the shortcomings with the
Internet, especially in the areas of network efficiency.
Consider how packets on an (Ethernet) LAN are sent from the source to the
destination. On the LAN all connected computers are able to listen and send
simultaneously. This is essential, as in order to send data onto the
network, the network must be silent. It is still possible for two computers
to send data simultaneously. However, in this case, there will be a
collision, which the computers will recognise, and they will re-send their
data at a later time. While the data is on the network, any computer
(typically the one to which it is addressed) can read the data off the
network. If it were possible to address a packet to all computers
on the LAN, then all computers would receive the same data, without
duplicating packets. However, the more computers are attached to the LAN,
the less likely it is that computers can send data at any point in time,
because the probability that the network is already active is increased.
This results in the illusion that the network is slow. Consequently, IP
broadcasts are typically not allowed to cross boundaries created by
routers. A router creates a physical divide between computers on one LAN
(attached to one of the router's ethernet cards) and computers on another
LAN (attached to the router's other ethernet card). However, as it is
possible to reach many computers without linearly increasing the amount of
required bandwidth, broadcasting remains attractive. One of the down sides
of broadcasting is that all computers will receive the data, even if it is
not addressed to some of them. Injudicious use of broadcasting would result
in a severely saturated network. Multicasting provides the benefits of
broadcasting, but it limits the broadcast to those computers which should
be exposed to the data (except, of course, where this is unavoidable).
Implementing this is quite a departure from IP, so specialist hardware, the
Real-Time Protocol and software needs to be employed to benefit from multicasting on the
Internet.
Multicast hardware and software support
At the end of the day, multicast datagrams cannot be relayed on the
Internet in their raw form. Instead, they are packaged as ordinary unicast
datagrams which routers can forward. Operating systems need to be modified or patched in
order to address IP datagrams to multiple destinations, and then disguise
the datagram as a unicast datagram. Currently, most operating systems which
can be modified to support multicast routing are UNIX-based. This solves
the problem of addressing data to multiple computers, engaged in a
multi-way conferencing session, or receiving a live broadcast, without
creating undue traffic on the Internet, and without the host duplicating
effort to create multiple datagrams that are identical apart from the
address. However, we are still unable to provide isochronous transfer
modes, which permit applications to impose and adhere to real-time delivery
time scales, because this requires the support of lower layers that actually have
control over resources in switches and routers.
Back to the index for this course.
In case of any difficulties or for further information e-mail
[email protected]
Date last amended: Monday, 22 March, 1999