CSA402

Lecture 13 - Hypermedia and the World-Wide Web: Part 2

References:
Steinmetz, R., and Nahrstedt, K. (1995). Multimedia: Computing, Communications & Applications. Prentice Hall. Chapter 15.

WWW

The World-Wide Web is an Internet service. Consequently, it is implemented on top of TCP/IP and uses the HyperText Transfer Protocol (HTTP). Usually, HTTP servers listen on port 80. HTTP, like all other internet services, assumes that TCP/IP will correctly transfer data between computers.
In HTTP, documents are uniquely identified using a Uniform Resource Identifier (URI), usually in the form of a Uniform Resource Locator (URL).
The HTTP protocol is based on a request/response paradigm. A client establishes a connection with a server and sends a request to the server in the form of a request method, URI, and protocol version, followed by a message containing request modifiers, client information, and possible body content. The server responds with a status line, including the message's protocol version and a success or error code, followed by a message containing server information, entity metainformation (e.g., document accounting information, like "date last modified"), and possibly body content.
The URL is the generic document naming strategy for HTTP. It usually takes the form of

protocol://host [:port] [absolute_pathname] [#fragment]

where protocol is the protocol to use, e.g., http, host is the domain name or IP address of the HTTP server hosting the document, e.g., www.cs.um.edu.mt, port is the port on which the HTTP server is listening - the default is 80, absolute_pathname is the fully qualified path of the document from the HTTP server's document root, and fragment is a symbolic offset within the document. If the absolute_pathname is omitted, then the HTTP server refers to a default document.
You will have noticed that the http protocol is, in fact, not the only protocol that can be referred to in the URL. An HTTP client can be designed to accept any Internet protocol (e.g, ftp, gopher), in which case it will interact with the appropriate application layer on the server-side. HTTP clients (Web browsers such as Microsoft Internet Explorer and Netscape Communicator) can offer a consistent user interface to many Internet services.
HTTP requires that a telnet connection is made by the client to the HTTP server (and appropriate port), prior to the request for a document being sent. The request of the server can then be made. Normally, once the server has satisfied the request, the server will terminate the connection. However, either the client or the server can abnormally terminate the connection (if, for instance, the user interrupts the download of a document, or if the server crashes). The client must be resilient enough to recognise an abnormal termination of the connection.
An HTTP request generally takes the form of

<request_type> <URL> <HTTP_version_number> CRLF

For example, GET http://www.cs.um.edu.mt/~cstaff/index.html HTTP/1.0. The server will respond with a message that includes the status (whether the document was found/not found/relocated/etc.), document metainformation, and the document body (if the document was found). For example,

[24] zeus telnet www.cs.um.edu.mt 80 -- connect to HTTP server on port 80 Trying 193.188.34.81... Connected to babe.cs.um.edu.mt. Escape character is '^]'. GET http://www.cs.um.edu.mt/~cstaff/index.html HTTP/1.0 -- HTTP request HTTP/1.1 200 OK -- status line, 200 indicates document exists and can be downloaded Date: Tue, 16 Mar 1999 10:01:31 GMT -- date document requested Server: Apache/1.2.1 -- identity of HTTP server Last-Modified: Wed, 27 Jan 1999 15:25:47 GMT -- date document last modified Content-Length: 2319 -- in bytes Accept-Ranges: bytes Connection: close Content-Type: text/html -- document type <HTML> <title>The "Never to be Completed" Site</title> <h4>Chris Staff's not at home, Page</h4> Well, you've stumbled across my site. I'm a firm believer that 99% of all Web sites should be under construction, so I'm not even going to bother putting graphics of construction workers up on this page, because they'll never be removed!<p> One day this page will be neatly laid out, but it isn't going to happen for the foreseeable future.<p> <h4>A little bit about me</h4> <img align=bottom src="http://www.cs.um.edu.mt/%7Ecstaff/courses/lectures/csa3020/graphics/me.gif">I lecture in Computer Science in the <a href="http://www.cs.um.edu.mt">Dept. of Computer Science and A.I.</a> at the <a href="http://www.um.edu.mt">University of Malta</a>. I'm also reading a PhD in <a href="http://www.cs.um.edu.mt/%7Ecstaff/courses/lectures/csa3020/ahs.html">Adaptive Hypertext</a> at <a href="http://www.cogs.susx.ac.uk">The School of Cognitive and Computing Sciences</a>, University of Sussex. I'm about half way through at the moment (March 1997). <p> <h4>The courses I teach</h4> You can follow the links to course notes, where they're on-line<p> <a href="http://www.cs.um.edu.mt/~cstaff/courses/lectures/csm202">CSM202</a>: <b>Operating Systems</b>. <br> <a href="http://www.cs.um.edu.mt/~cstaff/courses/lectures/csm210">CSM210</a>: <b>Systems Programming in C</b> (Part I).<br> <a href="http://www.cs.um.edu.mt/~cstaff/courses/lectures/csa402/index.html">CSA402 </a>: <b>Graphics and Multimedia Systems: Multimedia Systems.</b><br> <p> These lectures form part of the University's <a href="http://www.cs.um.edu.mt/courses/">BSc IT (Hons.)</a> degree. The Web pages for the degree also give information like course descriptions, when lectures are scheduled for delivery, how many credits they're worth, etc.<p> I also service some other lecture courses.<p> <a href="http://www.cs.um.edu.mt/%7Ecstaff/courses/lectures/csa3020/courses/it4hrd/index.html">Practical I.T. for Human Resource Development</a>, for the MA in Human Resource Development.<br> <b>Marketing on the WWW</b>, for the MA in Marketing.<p> <h4>Contact Details</h4> E-mail: <a href="mailto:cstaff@cs.um.edu.mt">cstaff@cs.um.edu.mt</a> <p> Postal Address: <address>University of Malta, Dept. of Computer Science and A.I., Tal-Qroqq, Msida MSD 06, Malta, Europe</address> <p> Location on Campus: Room 402, New Computer Building (off Car Park 5)<p> Telephone: (356)-32902506<br> Fax: (356)-320539<p> </HTML> Connection closed by foreign host. -- server closes connection [25] zeus

The content-type field is an important part of the Web. The Web supports of a variety of multimedia types. The type is used to indicate to the Web client how the data being downloaded should be handled. For example, GIF images, have the content type image/gif, so that the Web client can handle the stream as a possibly compressed image. Other types might require the loading of an application in which to display the data (e.g., video, or a Microsoft Word document). The request_type can also be HEAD, in which case only document metainformation is downloaded by the server. This is particularly useful to see if the document has been changed since the last time it was downloaded. In order to make efficient use of the Internet, Web clients use a local cache, in which recently downloaded documents are stored. If the document is still in the cache and hasn't changed since it was cached, then the document is loaded from the cache instead. Consequently, unless directed otherwise by the user, GET requests are typically preceded by a HEAD request.
The request typically originates within Web client application. A user will be browsing through a document, using a Web browser, and will click on text which is marked up as being a "link" to another multimedia document. The majority of documents on the Web are so-called HTML documents, documents which contain instructions which the Web browser can use to interpret the manner in which the document is to be displayed to the user. HTML, the HyperText Markup Language, is a presentation language which is used to indicate the context of text. Text is modified in accordance with recognised tags (which are composed of a < and > surrounding a modifier). For example, the following text is interpreted as being emboldened by the presence of the <b> and </b> tags surrounding it. The </b> indicates that emboldening ends at that point. <b>This text is emboldened</b>. Typically, a tag is a pair <tag>text</tag>. The majority of tags are used to enable a browser to modify how the content of the document is displayed to the user, according to user preferences. The glue that links documents to each other is the HTML Hypertext reference tag, <A HREF="URL">anchor_text</A>, where the anchor_text is the text in the document, which, when clicked on, results in the generation of an HTTP request to display the document identified by the URL.
When a user clicks on a link, the application will generate a request for the associated document to be displayed. Typically, this involves downloading the document to the client's computer and processing it locally. The client will set up a communications channel by telneting to the server's port identified in the URL. If the connection is successful, and if the document is already in the client's cache, the application may generate a HEAD request to check the modification date of the document. If the document has not been modified since it was last cached, then the client will reuse the version in the cache, otherwise, or if the document is not already in the cache, the application will generate a GET request (possibly first having to reconnect to the server using telnet). The client waits until it receives a response from the server, or until the request times out. It then processes the response, or reports a time-out error.

Deficiencies of TCP/IP

The deficiencies of TCP/IP should be fairly obvious, given what you already know about multimedia and the brief description of TCP/IP given above. However, for the sake of completeness, they are given below.

TCP/IP is perfectly suited to the requirements of discrete data, but the inherent properties of TCP/IP means that it is not suitable for continuous media types.

The overriding factor in IP is that datagrams are transported to the sink over the fastest available route. This means that datagrams can be delivered out of sequence. It also means that QoS parameters are severely restricted, as no minimum of maximum delivery times can be specified. Also, round-trip delays are subject to network loads, and service can deteriorate rapidly, requiring the sink and source to renegotiate QoS parameters frequently during a dialogue.
TCP is primarily concerned with ensuring that datagrams have been received correctly. If the source has not had an acknowledgement that the datagram has not been received, then it re-sends it. For video and audio broadcasts, this is wasteful of resources as more often than not, the client application will have missed the deadline for processing the missing data.
If many sinks are receiving the same broadcast, TCP/IP at the source is responsible for generating as many individually addressed datagrams as are necessary. This implies that the sinks may not be receiving the data in synchrony (it may take a variable amount of time for each datagram to be generated and routed to the individual sinks), and that the broadcast is contributing to network congestion. The TCP/IP network is unable to give any guarantees over and above the guarantee that as long as there is a route between the computers involved in the multimedia session, then data will eventually be routed between the source(s) and the sink(s). It is up to the application layers to provide additional guarantees, in so far as they are supported by TCP/IP. For example, in the case of real time video and audio playback or telephony over the Internet, servers and clients can negotiate on the frame rate and buffer size, taking into account the current round-trip delay. If the client experiences a degradation or an improvement of service, it can renegotiate the frame-rate.

The Multicast Backbone - MBONE

The Multicast Backbone was designed specifically to overcome some of the shortcomings with the Internet, especially in the areas of network efficiency.
Consider how packets on an (Ethernet) LAN are sent from the source to the destination. On the LAN all connected computers are able to listen and send simultaneously. This is essential, as in order to send data onto the network, the network must be silent. It is still possible for two computers to send data simultaneously. However, in this case, there will be a collision, which the computers will recognise, and they will re-send their data at a later time. While the data is on the network, any computer (typically the one to which it is addressed) can read the data off the network. If it were possible to address a packet to all computers on the LAN, then all computers would receive the same data, without duplicating packets. However, the more computers are attached to the LAN, the less likely it is that computers can send data at any point in time, because the probability that the network is already active is increased. This results in the illusion that the network is slow. Consequently, IP broadcasts are typically not allowed to cross boundaries created by routers. A router creates a physical divide between computers on one LAN (attached to one of the router's ethernet cards) and computers on another LAN (attached to the router's other ethernet card). However, as it is possible to reach many computers without linearly increasing the amount of required bandwidth, broadcasting remains attractive. One of the down sides of broadcasting is that all computers will receive the data, even if it is not addressed to some of them. Injudicious use of broadcasting would result in a severely saturated network. Multicasting provides the benefits of broadcasting, but it limits the broadcast to those computers which should be exposed to the data (except, of course, where this is unavoidable). Implementing this is quite a departure from IP, so specialist hardware, the Real-Time Protocol and software needs to be employed to benefit from multicasting on the Internet.

Multicast hardware and software support

At the end of the day, multicast datagrams cannot be relayed on the Internet in their raw form. Instead, they are packaged as ordinary unicast datagrams which routers can forward. Operating systems need to be modified or patched in order to address IP datagrams to multiple destinations, and then disguise the datagram as a unicast datagram. Currently, most operating systems which can be modified to support multicast routing are UNIX-based. This solves the problem of addressing data to multiple computers, engaged in a multi-way conferencing session, or receiving a live broadcast, without creating undue traffic on the Internet, and without the host duplicating effort to create multiple datagrams that are identical apart from the address. However, we are still unable to provide isochronous transfer modes, which permit applications to impose and adhere to real-time delivery time scales, because this requires the support of lower layers that actually have control over resources in switches and routers.

Back to the index for this course.
In case of any difficulties or for further information e-mail [email protected]

Date last amended: Monday, 22 March, 1999