CSA3020
Lecture 3 - Overview of Multimedia Systems
Introduction
So, what does it mean for a system to be a multimedia system?
The word multimedia in a computer environment implies that many media are under computer
control. In its loosest possible sense, a multimedia computer should support
more than one of the following media types: text, images, video, audio and
animation. However, that means that a computer which manipulates only text
and images would qualify as a multimedia computer.
Happily, there is a stronger definition of a multimedia computer: a
computer which controls at least one media type of each of continuous and
discrete media.
Text and images are example of discrete media (i.e., they are
time-independent), whereas video and audio are time-dependent, and
consequently, continuous.
The processing of time-independent media should happen as fast as
possible, but this processing is not time critical because the validity of
of the data does not depend on any time condition. However, in the case of
time-dependent media, their values change over time - and, in fact,
processing values in the wrong sequence can invalidate (part of) the
data.
Characteristics of a stand-alone multimedia system
A stand-alone multimedia system may have I/O devices for the capture of,
but at least for the playback of, both discrete and continuous media types.
Audio and video usually have significantly higher bit-rates than the
system throughput is capable of. Additionally, audio and video in their raw form are
significantly large. For example, PAL standard video has 25 frames per
second. A single full-size frame (at a resolution of 640 x 480, and a 16-bit
colour depth) would require 4,915,200 bits, or 614,400 bytes to
represent it. Full-motion video of that resolution and colour depth would
require 15,360,000 bytes, or 14.65 Mbytes, for a single second of video. Even
a 30 second video clip would require more than 439 Mbytes of storage. For
these reasons, and to facilitate transmission of continuous media over a network, video and
audio are usually compressed. Real-time decompression of the video and
audio streams is possible in software, although currently hardware support
is required for real-time compression (of video).
Storage of compressed video and audio still comsumes vast amounts of
magnetic or optical storage media. CD-DA (Compact Disk - Digital Audio)
technology is also used to store video. Originally, the maximum data
transfer rate was 150Kb/second (compare this with the requirement of a
data transfer rate of 14.65 Mbytes/second for uncompressed full-screen
full-motion video!). Even compressed full-motion video optimised for playback in a
smaller (typically 160 x 120) window required significantly high data
transfer rates. Compromises to ensure that the video playback was of some
enjoyment included dropping the frame rate (the human eye is deceived into
perceiving motion at a minimum of 16 frames per second) and colour depth
of the stored video. If an audio stream was also present it was usually
presented in 8-bit mono to reduce the overhead on the system bus.
Modern CD devices operate at around 50x the original - capable of
producing data transfer rates in the region of 7.5Mbytes/second -
(DVD manages about 10Mbytes/s; a modern AV hard disk is capable of approximately 59.7Mbytes per
second). An advantage that
CDs have over HDs is that data is written to contiguous sectors, meaning
that there is little or no head movement (and hence, latency), giving
almost constant retrieval times. HDs are re-usable, and as blocks are required and
released, although there may be space overall to store high capacity data,
there is no guarantee that the data will be stored in contiguous blocks.
To overcome this, modern file systems allocate space in larger blocks
(typically 64k/128k blocks), and make more of an effort to store data in
cylinders, also reducing seek time, in an effort to maintain constant
transfer rates, while still catering for the more bursty rates required by
discrete media.
A stand-alone multimedia system, then, has a large HD capacity, a
CD-ROM/DVD-ROM
player, and a sound-card. There are also specifications governing the
computer monitor, but these are outside the scope of this lecture.
Additionally, a multimedia computer will have speakers and optionally a microphone.
For video playback no
hardware is required, but for video capture, a video capture card (which
has at least a resident Digital Signal Processor, and optionally a video
codec (real-time compressor/decompressor)) is necessary. External ports enable the
coupling of a scanner, digital camera, etc., to capture high volume
multimedia data.
Characteristics of a networked multimedia system
A networked multimedia system typically shares the characteristics of a
stand-alone multimedia system with the additional requirement that it is
connected to a (reasonably high-speed) network.
The main implications are that a networked multimedia system is likely to
be i) a shared resource, possibly serving multimedia data to other
networked computers; ii) runs distributed multimedia applications (e.g.,
CSCW, video conferencing, etc.); and iii) at least, is used to
remotely access multimedia data.
The network connection is a bottle-neck. Additionally, if the networked
computer is a server, or is running distributed multimedia applications,
then there is also the overhead of preparing data for distribution over
the network, possibly in conjunction with capturing the multimedia data in
the first place. For example, a Web server which has video files available
for download may be simultaneously accessed by several clients. A computer
used for video conferencing must capture audio and video, compress it, and
prepare it for transmission over the network. The slower the network
connection, or the more traffic there is on the network, the less likely it
is that data, although it may be captured in real-time, can be transmitted
at the corresponding rate.
There are a number of significant disadvantages with the more
common existing network standards, which are really a disadvantage of the network
protocols. Take TCP/IP, the underlying protocol for the Internet. A data
file to be transmitted is divided into packets of equal size
(typically 1K), and numbered sequentially. The underlying motivation of the protocol
is to find the
fastest route from the source to the destination. Each packet may take a
different route to arrive at the destination. This implies that the
packets may (and usually do) arrive in a different sequence from the one in
which they were sent. With text and graphics file, or any file which is to
be downloaded prior to being viewed, this is not a problem. The
destination computer simply waits for all packets to be received prior to
reassembling them. If packets do not arrive in time, then the destination
computers re-requests the missing packets. So far, so good.
Video and audio files are typically large (often inconvenient to download
in their entirety first), and the Internet is also used
for live broadcasts (which, by implication, cannot be stored and played
back later!). Video and audio streaming allows continuous
multimedia data to be played as it is received. However, what happens to
packets that are received out of sequence? Sometimes, if the delay is very
small, they can be buffered, but
typically, as it does not make sense to play video and audio data out of
sequence, the "bad" packets are discarded, resulting in jitter when either
the system waits for missing packets or skips bad packets). The
scenario can be rectified slightly if the client (destination) informs the
server (source) that many packets are being "lost". Round-Trip Delay (RTD)
information can also be relayed to the server so that the server sends less
data to the client by, for example, dropping the video frame or audio
sampling rates. This can be an on-going negotiation during the session.
Although this is a feature of the applications on the Internet, with ATM
and B-ISDN, this negotiation is an essential part of the network.
Traditional Data Stream Characteristics
Asynchronous Transmission Mode
There is no time restriction on the transmission of packets. Packets reach
the receiver (client) as fast as possible. Transmission of discrete data
typically requires only this transmission mode.
Synchronous Transmission Mode
A maximum end-to-end delay is specified, for the transmission of packets, and
the maximal delay is never violated, although each packet may be received
at any arbitrary earlier time. This is essential for multimedia
applications which require that no packets are dropped due to network and
server overheads. All packets sent will be received in a timely fashion.
Note, however, that the sequence in which packets are received is not
guaranteed. The client will still need to buffer packets which arrive out
of sequence. There may also be a slight jitter in the playback of the
stream as the playback system waits for data which has not yet arrived, but
which is still within the bounds.
Isochronous Transmission Mode
As well as a maximum end-to-end delay, a minimum end-to-end delay is also
specified and guaranteed. This reduces (but does not completely eliminate)
the need for temporary storage in the client to buffer out-of-sequence
packets, as well as jitter. However, the implication is that nodes on the
network are reponsible for storing packets which have already been sent by
the server, but which are not yet ready to be received by the client.
Back to the index for this course.
In case of any difficulties or for further information e-mail
[email protected]
Date last amended: 2nd September 2002