CSA3020

Lecture 2 - Discussion of issues arising from HyperLand

The subjects below are organised under the following topics. Some subjects fall under more than one topic...

Multi-modal User Interface

The ability to interact with a system using natural language (written and spoken) and gesture, as well as through traditional pointing devices, etc. We are already some way towards this - user interfaces can be trained to associate words (processed as sounds) with actions - e.g., saying "start word processor" results in your favorite word processor being launched. Examples of using gesture to interact with a system include tapping and finger dragging via a touch screen or pad. Of course, we are still far away from computer systems which can easily understand written, let alone spoken natural language. Nor are we close to interacting with a visual system which can interpret a range of gestures.
What metaphor should be used for the new class of user interface? The Apple Macintosh gave us the desk top for manipulating files and data in a traditional, character-based environment (even though the metaphor is highly graphical). Will the new multi-modal interface use the metaphor of an (anthropomorphised) user agent?

Operating System

Existing operating systems are designed largely for processing character-based data. Multimedia systems intrinsicly utilise 2D- and 3D-graphics, animation, video, audio, as well as text. Some obvious disadvantages with traditional operating systems are the file system is designed for fast access to small files - multimedia documents are inherently large, and consequently disk access is relatively slow; file systems are also "small" - calculate the difference between the phyisical maximum imposed on the file system in traditional 16- and 32-bit addressing systems and compare with the new 64-bit addressing systems, both for hard disk space and RAM. OS interrupts and bus speed are ok if data being processed is textual or slow-changing - we can read (off the screen) slower than the OS can put the information there, so interrupts and speed aren't an issue. However, if we are watching a video or listening to rapidly changing CD-quality audio, then an interrupt is actually visible and audible, and consequently spoils our enjoyment. Yet, in a multi-user system, we cannot simply disable interrupts if one of the users is watching a video - other users wouldn't receive a service.

Back

Multimedia documents

We are all used to so called multimedia documents that are predominantly text-based, but which may include a few 2D-graphics. We are also used to video which contains an audio stream. But what about a document which contains both text and video, where the text is to be read along-side the viewing of the video? How do you synchronise the text display to the video? Text-processing systems do not have a time attribute, yet time is an intrinsic part of multimedia documents.
This leads to the general question about how multimedia documents can be easily read. Currently, graphics and video, and even audio bites are "added" to inherently textual content to make the content more interesting. Give 'em a video video clip to watch and hopefully they'll read the text... but what will happen once the novelty wears off?

Back

Time sequencing and Synchronization

Imagine a scenario where a multimedia document contains a user-rotatable 3D object which has a supporting textual description. As the object is rotated, the text changes to describe what is currently visible... Name one programming language which supports this level of synchronization.
Typically, most synchronization issued are related to lip-synching an audio track with a video track, or synchonizing sub-titles with a video track. Generally, this is achieved through support from the application (e.g., Adobe Premiere, MacroMedia Director, QuickTime) or the compression scheme (e.g., MPEG-2), and they almost explicitly revolve around video and animation sequences.

Back

Real-time programming for handling continuous media

An example of where this is an issue has already been discussed in the section on multimedia documents. This essentially revolves around event-based programming. Java and other object-oriented programming languages do handle events, but not many, if any, explicitly handle multimedia data types, although some applications can (e.g., MacroMedia Director, Authorware Professional, etc.). What we want to be able to do is, for example, trigger events based on the appearance or disappearance of certain characters in a film.

Back

Adaptivity and User Modeling

In HyperLand, the agent quickly picked up on Adams' declared and perceived interests and preferences, and sought to present information which was relevant to these interests. Any system which displays this kind of behaviour is known as an adaptive system. Adaptive systems are not restricted to multimedia or hypermedia systems, but describe any system which is capable of learning about and distinguishing between, and adapting to, different users.
As an example of what I mean adaptivity to be, think about your favourite Web browser for a moment. As it stands, it is adaptable (which should not be confused with adaptive), because although many different users can use the same implementation of the interface, it "remembers" each individual's preferences in terms of bookmarks, which page we want the Web browser to display when we first launch it, the colour used to display visited and unvisited links, etc. Now, imagine the Web browser used our bookmarks and other frequently visited sites to determine what sort of information we are interested in, and it went away and searched for related information. Imagine the Web browser was able to spot patterns in our viewing habits... every Monday morning, as soon as I get into the office, I read reports on Liverpool and Juventus's week-end matches. Wouldn't it be nice if the Web browser knew this and downloaded this and other related information so that it's there, waiting on my desktop, ready to read. And so on... If the behaviour of system goes beyond merely doing what the user told it to do, in that it learns to do what the user would like it to do, then the system is considered to be adaptive.
In order to achieve adaptivity, two fundamentals are necessary: i) the system needs to learn about user preferences and interests as unobtrusively as possible, and ii) the environment within which the adaptive agent works must be adequately described in order for the agent to reason about what is and isn't relevent.
User preferences and interests are represented and manipulated by a user model; the information base needs to be described using a formalism which is compatible with the user model; and, typically, an information retrieval system compares the user model with the information representation to determine the degree of relevence of documents in the information base to the user. According to Peter Brusilovsky, an adaptive hypermedia system is one "which reflects some features of the user and/or characteristics of his system usage in a user model, and utilizes this model to adapt various behavioral aspects of the system to the user." (Brusilovsky, P., et. al. (1998). Adaptive Hypertext and Hypermedia. Kluwer Academic Publishers.)

Back

Input/Output devices

We're now all used to pointing, clicking and dragging with a pointing device, such as a mouse. However, earlier, we discussed using multi-modal user interfaces through which we could use a mixture of spoken or written language and gesture to communicate with the system. Does it have to stop here? Imagine playing virtual squash with an opponent on the other side of the world. We could wear pads through which our current position on the virtual squash court can be computed and relayed back to us and to our opponent through light-weight goggles. Through a glove and a stump the size and shape of a squash racket handle the computer system knows where the virtual racket is, and sends electronic signals which are interpreted by our brain as having made impact with the virtual squash ball, etc. Well, that, perhaps, is for the distant future, but certainly, especially if we are interacting with a virtual world, we need auditory and visual i/o devices, as well as haptic devices which can relay input from and output to our hands and feet. Can you imagine playing Doom or Quake in a virtual reality environment, running on a treadmill, turning on the treadmill, and holding an imaginary weapon in a gloved hand, simulating squeezing triggers, changing weapons, etc? Would you still be able to play it hours at a time, or would you be exhausted within 30 minutes?

Back

Distributed Multimedia

Networked multimedia takes many shapes and forms, from downloading high-volume multimedia traffic off the Internet, to video conferencing, interactive games, and internet telephony, to video-on-demand, interactive TV and shared virtual reality. Essentially, apart from Distributed Multimedia Operating Systems, the communications networks connecting users to each other and to centralised and distributed multimedia file systems can effect the listening/viewing/participation quality of users. The largest WAN in the world, the Internet, is not, in its present form, conducive to interactive distributed multimedia environments. Inherent characteristics of TCP/IP, bandwidth usage, and reliability all contribute to the disadvatages of using the Internet as the primary vehicle for distributed multimedia.

Back

Navigation and Hypermedia

Hypermedia is a not-so-new paradigm for navigating through multimedia documents. (Hypertext is the same paradigm but applied to text-only documents - however, hypertext and hypermedia are pretty much interchangeable these days, and I will use them interchangeably). The principles of hypermedia are not restricted to computer-based information, either - networked computers just make the navigation seamless.
Anybody who reads a newspaper, an encyclopedia, or a dictionary knows that they are not designed to be read linearly. Hypermedia is the same - through networked computers (such as the WWW), documents can be linked to other relevent material simply by embedding a link anchor in the source document which refers to the destination document. Is it that simple, though? How can link anchors be embedded into video, animation? WWW image maps provide a way of linking regions within a 2-dimensional rastor image to other documents, but is this a solution or just a patch?

Back

Content-based search

What's content-based search? Back in the olden days, when a megabyte of hard disk space cost more than your house, people began to realise that a file name wasn't really enough of a description of what the file contained. Take this web page - it's called mm2.html. I know what's in it because I'm the author of it. However, will I still remember what's in it in a month's time? Unlikely. And what about those poor souls out there to whom the information in the document might be of interest, but who will never know that the file with the URL http://www.cs.um.edu.mt/~cstaff/courses/lectures/csa3020/mm2.html even exists, let alone contains an important nugget of information?
The minute technology enabled the permanent storage of data, others wanted to be able to automatically locate files by their content, rather than by their name. And the field of Information Indexing, Search and Retrieval (IR for short) was born. IR, because of the nature of content of documents stored, has for a long time focused on text-based document retrieval. And now we're doing horrible things like storing pictures, audio bites, video clips and animation sequences... and, of course, we want to be able to locate them by their content.

Back

Personalised Multimedia Services

Tom Baker, in HyperLand, called himself an agent. And now we have the field of intelligent agents... autonomous chunks of software which are designed to perform a specific task (or which may learn to perform tasks) on the behalf of human users. What does this entail? Well, pretty much all of the above! A software agent needs to learn about the user who owns it, needs to be able to communicate with other specialised agents across networks, needs to locate and retrieve multimedia information for their owners or perform the service required of them by their owners, perhaps communicate with their owners through novel i/o devices (why not cheat at virtual squash by using a performance-enhancing agent?), and so on...

CSA3020

Lecture 2 - Discussion of issues arising from HyperLand

User Interface Issues

System Issues

Programming Issues

Interactivity Issues