Text 2.0 - A Brief Introduction

Text 2.0 is our vision of how reading can evolve on digital devices.

Many eBook readers have been presented lately and one can assume that during the next years even more text will not be read on printed paper anymore, but rather on displays. At the same time new input devices have evolved and get smaller with every new generation. Just compare, for example, the advancement of digital cameras over the last two decades.

With Text 2.0 we want to focus on one input paradigm which we think has the potential to dramatically influence the way we interact with information, directly as well as through all indirect information benefits it delivers: eye tracking.

An eye tracker is, simply speaking, any device that is capable of measuring where someone is looking at. Modern eye trackers usually achieve this by illuminating the eye with infrared light whose reflection is in return recorded by a camera. This enables the device to determine where the user's line of gaze actually hits the screen and to provide this information for applications.

Using Eye Tracking Data

While eye tracking data can be used in many ways, one very appealing property of real time data recorded while reading is its implicitness. The user does not have to focus on any sort of explicit control but can rather do what he or she really wants to do, which is reading.

Text 2.0 uses the tracker's information and links it with the textual information presented on the screen to analyze and evaluate the current situation of the user:

  • Is the user paying attention?
  • If yes, to what?
  • Is there an indication of comprehension difficulties or is the reader just skimming the text?
  • Which passages have been read, which haven't been, and which have been read over and over again?

These questions, and many more, are all items eye tracking can help to answer and so text can become aware of and responsive to them.

In an early structuring of potential applications we see two principal branches of Text 2.0, an explicit one called Augmented Text and an implicit branch called Augmented Reading. Even though from an end user's point of view Text 2.0 is always implicitly influenced through reading, from the author's point of view the following two concepts differ.

Augmented Text

Augmented Text denotes text which has been annotated manually. These annotations contain directives on what should happen when a certain event occurs. Examples of such directives are play a sound when the user reads here or show a theme as long as the user stays there. We believe that Augmented Text has its assets in the entertainment domain of text.

Augmented Reading

Augmented Reading on the other hand subsumes all techniques that influence the interaction with the text and are free of manual declarations. Automatically displayed translations, bookmarks or Wikipedia summaries are examples of Augmented Reading applications which can be, in theory, implemented independently of the specific material being read.

Interoperability of Augmented Text and Augmented Reading

In terms of end user functionality and their understanding some Augmented Reading features can be emulated with Augmented Text, yet they even have to be in certain situations, namely where proper Augmented Reading algorithms are not yet present, to a satisfying degree. This includes, for example, translations and explanations on the fly. Both require a substantial background knowledge and understanding to properly detect the true translation and most appropriate explanation. In this respect the progress of some Augmented Reading applications is closely related to the progress in the corresponding fields of artificial intelligence.

Our Vision

Text 2.0 is our vision of how text and reading can evolve on digital devices, and it is by no means a specific prediction. We are not stating that these particular applications presented on this site will be widespread at any time in the future, we rather state that we see no substantial reason that should prevent them from becoming so, given certain conditions are met.

Most of the features Text 2.0 offers are bound to eye tracking. While current eye tracking devices are rather bulky and expensive, the development and miniaturization of similar technologies showed an amazing advancement during the past years. We think the same advancement can also occur in the development of eye tracking devices.

Augmented Text, as such, already exists today. The eyeBook has proven to work for a majority of its users; our newly created Text 2.0 browser plugin appears to work equally well. It is therefore not a question whether this technology will emerge, but rather to what extent we will see its future distribution and acceptance among readers.

As Augmented Reading, on the other hand, comprises a larger set of methods and ideas we are more readily tempted to sketch a possible scenario. Given that eye tracking devices become as common as web cameras are today, we see everyday benefits especially in the domain of knowledge workers. These areas will gain direct benefits of having meta information about the short and long term knowledge acquisition process. The benefits range from the direct provision of demanded information items to the automatic annotation of text for later retrieval up to collaborative approaches within workgroups.

The features we discussed so far were mostly user-centric. However, we see benefits on a document level on the way towards Text 2.0 as well. While the acquisition of reading information can help the individual in understanding the information provided, the aggregated meta information of many users can be used in return to alter and improve the perused documents likewise. Beyond that, this aggregated meta information has the advantage of being obtained for free alongside a multitude of reading processes and of being objective.

From this perspective we think that Augmented Reading as such will, in the long term, have a robust foundation driving a further development mainly motivated by a set of factors which can be summarized as efficiency gains. Augmented Text on the other hand is based on known algorithms, it is thus more or less straightforward to implement and paves the way to research the usefulness of the more advanced Augmented Reading ideas.

As we already stated we do not know into which direction reading will evolve. We do, however, have the strong and justified belief that the some of the ideas presented here are very promising and are worth being researched and investigated further. We invite you to take part in this journey to discover what the capabilities and limitations, the promises and threats of interactive text and reading are, a journey that has only just begun.

With these ideas in mind we would like to conclude this brief introduction with the statement that we are most sure about:

Reading will stay exciting.

Implementation Details

After the short introduction of the last section we want to shed some light into our implementation of Text 2.0 and describe how it works. The principal components involved are a screen, an eye tracker and an intelligent rendering device . While the details of an eye tracker's and screen's functionality can be found elsewhere, we will focus on the description of our core Text 2.0 functionality.

In technical terms there is no magic about the aforementioned rendering device. It is a web browser extended with special plugins and algorithms as well as a set of background services and glued with an extensible backbone infrastructure.

A very simplified view onto our architecture.

All of our latest Text 2.0 applications are, in their heart, web pages. They are opened and displayed in a common web browser and contain HTML, CSS as well as JavaScript. Additionally they contain a few short statements ordering them to load our eye tracking plugin which, in turn, attaches itself to the connected eye tracker.

This plugin contains various sub-modules that handle different parts of the evaluation process. Preprocessing algorithms smoothen the measured eye tracking data and determine so called fixations. On top of that algorithms detect reading or skimming, the speed of progress or indications of comprehension difficulties. A modular extension dock enables application developers to add features like new algorithms and new services. Services already included range from DBPedia (a database containing Wikipedia information), speech in- and output up to linguistic and statistical processors.

The plugin also handles special HTML attributes related to eye tracking and reading. Within the current version we implemented, for example, the handlers onGaze and onRead. They can, with the exception of some special cases, be used like vanilla HTML event handling facilities.


Related Work

This section is by no means complete. In case you have any additional information, we would be happy to hear from you (drop a comment, for example). Please also have a look at the item "Is all of this your work?" in the FAQ.

  • Back in 2001, Koike et al. described in Interactive Textbook and Interactive Venn Diagram: Natural and Intuitive Interfaces on Augmented Desk System the concept of an interactive textbook that allows students to interact with virtual experiments corresponding to a page, which can be projected on a desk besides the book.
  • Bahna and Jacob presented in 2005 a paper on Augmented Reading: Presenting Additional Information Without Penalty, where peripheral vision is used to present the user extra information while reading. Please note that their notion of augmented reading differs from ours.
  • In 2006 (first work dates back to 2000), Aulikki Hyrskykari published her PhD Thesis. It describes an application called iDict, which analyzes the reading behavior under the focus of providing translation-assistance when reading foreign texts.
  • In 2008, Nobuhiro Inagawa and Kaori Fujinami published a paper called Making Reading Experience Rich with Augmented Book Cover and Bookmark, which describes an offline equivalent of the Augmented Text that works page wise, but is (in contrast to the eyeReader) 'bed-readable'.