CSA3020
Lecture 2 - Discussion of issues arising from HyperLand
The subjects below are organised under the following topics. Some subjects
fall under more than one topic...
User Interface Issues
Multi-modal User Interface
Input/Output devices
Personalised Multimedia Services
System Issues
Operating System
Time sequencing and Synchronization
Input/Output devices
Distributed Multimedia
Programming Issues
Multimedia documents
Time sequencing and Synchronization
Real-time programming for handling continuous media
Interactivity Issues
Adaptivity and User Modeling
Navigation and Hypermedia
Content-based search
Personalised Multimedia Services
The ability to interact with a system using natural language (written and
spoken) and gesture, as well as through traditional pointing devices, etc.
We are already some way towards this - user interfaces can be trained to
associate words (processed as sounds) with actions - e.g., saying "start
word processor" results in your favorite word processor being launched.
Examples of using gesture to interact with a system include tapping and
finger dragging via a touch screen or pad. Of course, we are still far
away from computer systems which can easily understand written, let alone
spoken natural language. Nor are we close to interacting with a visual
system which can interpret a range of gestures.
What metaphor should be used for the new class of user interface? The Apple
Macintosh gave us the desk top for manipulating files and data in a
traditional, character-based environment (even though the metaphor is
highly graphical). Will the new multi-modal interface use the metaphor of
an (anthropomorphised) user agent?
Related Link
Surf the Web by Voice
Back
Existing operating systems are designed largely for processing
character-based data. Multimedia systems intrinsicly utilise 2D- and 3D-graphics,
animation, video, audio, as well as text. Some obvious disadvantages with
traditional operating systems are the file system is designed for fast access
to small files - multimedia documents are inherently large, and
consequently disk access is relatively slow; file systems are also
"small" - calculate the difference between the phyisical maximum imposed
on the file system in traditional 16- and 32-bit addressing systems and
compare with the new 64-bit addressing systems, both for hard
disk space and RAM. OS interrupts and bus speed are ok if data being processed is
textual or slow-changing - we can read (off the screen) slower than the OS can put the
information there, so interrupts and speed aren't an issue. However, if we
are watching a video or listening to rapidly changing CD-quality audio,
then an interrupt is actually visible and audible, and consequently spoils
our enjoyment. Yet, in a multi-user system, we cannot simply disable
interrupts if one of the users is watching a video - other users wouldn't
receive a service.
Back
We are all used to so called multimedia documents that are predominantly
text-based, but which may include a few 2D-graphics. We are also used to video
which contains an audio stream. But what about a document which contains
both text and video, where the text is to be read along-side the viewing of
the video? How do you synchronise the text display to the video?
Text-processing systems do not have a time attribute, yet time is an
intrinsic part of multimedia documents.
This leads to the general question
about how multimedia documents can be easily read. Currently, graphics and
video, and even audio bites are "added" to inherently textual content to
make the content more interesting. Give 'em a video video clip to watch and
hopefully they'll read the text... but what will happen once the novelty
wears off?
Back
Imagine a scenario where a multimedia document contains a user-rotatable 3D
object which has a supporting textual description. As the object is
rotated, the text changes to describe what is currently visible... Name one
programming language which supports this level of synchronization.
Typically, most synchronization issued are related to lip-synching an
audio track with a video track, or synchonizing sub-titles with a video
track. Generally, this is achieved through support from the application
(e.g., Adobe Premiere, MacroMedia Director, QuickTime) or the compression scheme (e.g.,
MPEG-2), and they almost explicitly revolve around video and animation
sequences.
Back
An example of where this is an issue has already been discussed in the
section on multimedia documents. This essentially revolves around
event-based programming. Java and other
object-oriented programming languages do handle events, but not many, if
any, explicitly handle multimedia data types, although some applications
can (e.g., MacroMedia Director, Authorware Professional, etc.). What we
want to be able to do is, for example, trigger events based on the
appearance or disappearance of certain characters in a film.
Back
In HyperLand, the agent quickly picked up on Adams' declared and perceived
interests and preferences, and sought to present information which was relevant to these
interests. Any system which displays this kind of behaviour is known as an
adaptive system.
Adaptive systems are not restricted to multimedia or hypermedia systems,
but describe any system which is capable of learning about and
distinguishing between, and adapting to, different users.
As an example of what I mean adaptivity to be, think about your favourite
Web browser for a moment. As it stands, it is adaptable
(which should not be confused with adaptive), because although many
different users can use the same implementation of the interface, it
"remembers" each individual's preferences in terms of bookmarks, which page
we want the Web browser to display when we first launch it, the colour used to
display visited and unvisited links, etc. Now, imagine the Web browser used our
bookmarks and other frequently visited sites to determine what sort of
information we are interested in, and it went away and searched for
related information. Imagine the Web browser was able to spot patterns in our
viewing habits... every Monday morning, as soon as I get into the office, I
read reports on Liverpool and Juventus's week-end matches. Wouldn't it be
nice if the Web browser knew this and downloaded this and other related
information so that it's there, waiting on my desktop, ready to read. And
so on... If the behaviour of system goes beyond merely doing what the user told it
to do, in that it learns to do what the user would like it to do,
then the system is considered to be adaptive.
In order to achieve adaptivity, two fundamentals are necessary: i) the system
needs to learn about user preferences and interests as unobtrusively as
possible, and ii) the environment within which the adaptive agent works
must be adequately described in order for the agent to reason about what is
and isn't relevent.
User preferences and interests are represented and manipulated by a user
model; the information base needs to be described using a formalism which is
compatible with the user model; and, typically, an information retrieval
system compares the user model with the information representation to
determine the degree of relevence of documents in the information base to
the user. According to Peter Brusilovsky, an
adaptive hypermedia system is one "which reflects some features of the user
and/or characteristics of his system usage in a user model, and utilizes
this model to adapt various behavioral aspects of the system to the
user." (Brusilovsky, P., et. al. (1998). Adaptive Hypertext and
Hypermedia. Kluwer Academic Publishers.)
Back
We're now all used to pointing, clicking and dragging with a pointing
device, such as a mouse. However, earlier, we
discussed using multi-modal user interfaces through which we could use a
mixture of spoken or written language and gesture to communicate with the
system. Does it have to stop here? Imagine playing virtual squash with an
opponent on the other side of the world. We could wear pads through which
our current position on the virtual squash court can be computed and
relayed back to us and to our opponent through light-weight goggles.
Through a glove and a stump the size and shape of a squash racket handle
the computer system knows where the virtual racket is, and sends electronic
signals which are interpreted by our brain as having made impact with the
virtual squash ball, etc. Well, that, perhaps, is for the distant future,
but certainly, especially if we are interacting with a virtual world, we
need auditory and visual i/o devices, as well as haptic devices which can
relay input from and output to our hands and feet. Can you imagine playing
Doom or Quake in a virtual reality environment, running on a treadmill, turning on
the treadmill, and holding an imaginary weapon in a gloved hand,
simulating squeezing triggers, changing weapons, etc? Would you still be
able to play it hours at a time, or would you be exhausted within 30
minutes?
Back
Networked multimedia takes many shapes and forms, from downloading
high-volume multimedia traffic off the Internet, to video conferencing,
interactive games, and internet telephony, to video-on-demand, interactive TV and shared virtual reality.
Essentially, apart from Distributed Multimedia Operating Systems, the communications networks
connecting users to each other and to centralised and distributed
multimedia file systems can effect the listening/viewing/participation
quality of users. The largest WAN in the world, the Internet, is not, in
its present form, conducive to interactive distributed multimedia environments. Inherent
characteristics of TCP/IP, bandwidth usage, and reliability all contribute
to the disadvatages of using the Internet as the primary vehicle for
distributed multimedia.
Back
Hypermedia is a not-so-new paradigm for navigating through multimedia
documents. (Hypertext is the same paradigm but applied to text-only
documents - however, hypertext and hypermedia are pretty much
interchangeable these days, and I will use them interchangeably). The
principles of hypermedia are not restricted to computer-based information,
either - networked computers just make the navigation seamless.
Anybody who reads a newspaper, an encyclopedia, or a dictionary knows that
they are not designed to be read linearly. Hypermedia is the same -
through networked computers (such as the WWW), documents can be linked to
other relevent material simply by embedding a link anchor in the source
document which refers to the destination document. Is it that simple,
though? How can link anchors be embedded into video, animation? WWW image maps
provide a way of linking regions within a 2-dimensional rastor image to other
documents, but is this a solution or just a patch?
Back
What's content-based search? Back in the olden days, when a megabyte of
hard disk space cost more than your house, people began to realise that
a file name wasn't really enough of a description of what the file
contained. Take this web page - it's called mm2.html. I know what's in it
because I'm the author of it. However, will I still remember what's in it
in a month's time? Unlikely. And what about those poor souls out there to
whom the information in the document might be of interest, but who will
never know that the file with the URL
http://www.cs.um.edu.mt/~cstaff/courses/lectures/csa3020/mm2.html even
exists, let alone contains an important nugget of information?
The minute technology enabled the permanent storage of data, others wanted
to be able to automatically locate files by their content, rather
than by their name. And the field of Information Indexing, Search and Retrieval
(IR for short) was born. IR, because of the nature of content of documents stored, has
for a long time focused on text-based document retrieval. And now we're
doing horrible things like storing pictures, audio bites, video clips and
animation sequences... and, of course, we want to be able to locate them by
their content.
Back
Tom Baker, in HyperLand, called himself an agent. And now we have the field
of intelligent agents... autonomous chunks of software which are designed
to perform a specific task (or which may learn to perform tasks) on the
behalf of human users. What does this entail? Well, pretty much all of the
above! A software agent needs to learn about the user who owns it, needs
to be able to communicate with other specialised agents across networks,
needs to locate and retrieve multimedia information for their owners or perform the
service required of them by their owners, perhaps communicate with their
owners through novel i/o devices (why not cheat at virtual squash by using
a performance-enhancing agent?), and so on...
Related Link
VirtualFriend
Back
Back to the index for this course.
In case of any difficulties or for further information e-mail
[email protected]
Date last amended: 2nd September 2002