March 9, 2010March 9, 2010 by Josh Wilson

Francois Ragnet Deconstructs the Document

by Josh Wilson

The Internet is not just tearing apart macro-scale structures that drive the media and news industries; it’s also fragmenting the basic unit of commerce in the marketplace of ideas. As data become more easily transferred and transformed, documents begin to lose their status as coherent repositories for data sets. They also cease being static and immutable.

Francois Ragnet has been exploring these outcomes on his “Future of Documents” blog, and in his role as Managing Principal of Technology Innovation for Xerox Global Services. In an email dialogue with We Media, he lays out a virtual blueprint for 3.0 documents and dis-integrated datasets in the decentralized Internet era.

What are the properties of a 3.0 document? What tools does it give the user?

There are many properties to it, and I could linger for hours. Some of the key elements are: accessible, open, ubiquitous, and social.

Accessible: You will be able collaboratively access, edit, scan or print a Document 3.0 from anywhere — and it will be rendered on whatever device, whether mobile phone, eReader, computer screen, or other, typically from the cloud.

Open: Based on well known standards, its content will be readily accessible — the document becomes a mash-up of information and documents fetched from other documents.

Evergreen: This mash-up will allow “fresh” updates to the content to be fetched at runtime, thus keeping it evergreen — or retiring itself.

Social: It will become more interactive, the direct results of the collaboration between multiple users, but also feeding directly from social network content.

What can news media learn from the idea of “evergreen” documents?

News media can learn from documents and vice versa, as I feel there are lots of similarities between the two worlds. Both need to evolve: the newspaper is dying, as is pure paper document handling. Both need to evolve along a parallel path, and can learn from each other.

The document has/had to move from a static container of information to a live, evergreen collection of up-to-date data and atomic information elements. Similarly, news media is becoming much more reactive in feeding live data. However, both the document and the news article are still needed — a (normal) human cannot live off RSS or live database feeds. These feeds need to be distilled, validated, prioritized, and synthesized for the average human. Although some experiments are trying to test that, there is definitely the need for a “document” or a news article to aggregate all this information.

How will summary, excerpting and citation evolve functionally in the next five years? Will “subunits” of data within a document be portable and/or extensible?

Extensible — you’re right on it. XML (Extensible Markup Language) is a pillar for that evolution — not only is that becoming an open standard for interchange, but the granularity at which information will be accessed will become much finer. Today, only layout or coarse-grained information is accessible — paragraph, title, etc. However, a number of vertical schemas are appearing (e.g. XBRL for Business Reporting), to really capture the “semantics” of documents in a specific domain — For a news article it might be entities such as Locations, Amounts, Company names, Person names, Dates … but also relationships between those entities, such as temporal sequences, actions, etc…

As for summary, excerpts and citations — They will evolve, for sure. First, as we said earlier, they will be “tagged” through some special XML tags. But they can also be reconstructed on the fly – excerpts and citations have made progress in the past years owing to linguistic and statistics processing, but in the next five years will benefit from social tagging (a la PageRank) to refine these results. Summaries will remain an elusive target though, as a real summary entails some advanced techniques that only a human masters – and linguistics have not solved yet.

How will search change?

Search will become social, but also contextual and semantic. Other people’s searches, and not just incoming links, will be used for refining search ranks — and this will be refined by “social affinities” that are collected through the various social networks. Search will not be keyword driven, but expressed in natural language, and will allow search for facts or complex figures. “What did the president announce yesterday on Healthcare?” will be a natural query in a few years. The context will depend on location, date hour, as well as many other parameters that are processed in a huge database.

How will collaboration change?

Collaboration will become real time, and use many different channels. See examples as Google Waves, for example, as where the future of collaboration should be headed — real-time collaboration and feedback, multi-channel, multimedia, leveraging Web 2.0 technologies. Waves is actually quite extreme, but is definitely the direction collaboration is headed.

At the other end of the spectrum though, paper will still play a quite prevalent role in some forms of collaboration. Typically, paper does retain affordances that lend themselves very well to annotation, review, drawing and sharing while in a single physical place.

Is the future of documents hosted or distributed? In other words, will the extensibility of documents be driven by large providers such as Google, or by widespread protocols such as HTTP and PDF? Is this an either/or situation?

It will be mostly distributed — hosted on clouds or grid infrastructures — to allow for ubiquitous access to documents. The cloud will provide storage, but also processing power, and even front-end to any document management task — edit, modify, share, distribute, print, etc. This, from any device — mobile phone, eReader, PC, TV … as we can see first instances in tools such as Google Docs, Zoho Docs or Office 2010 online.

Two aspects will however impede full cloud adoption. Security and privacy concerns will limit some of this adoption. Where is my document going? Can I trust my Cloud provider with those documents and what they contain? The second one is reliability and long-term retention of these vital records. Will my cloud provider still be around, 20 years from now? Will I be able to still read these documents?

If you think of it, paper was great in that role of long-term archival. Your piece of paper would be there, 20 to 50 years from now. But cloud, or even electronic for that matter, is not there yet. Some people might turn to formats like PDF/A, but let’s hope the physical archival medium will remain — unlike floppy disks or tape backups.

So it is definitely not an either/or situation, more of a gradual transition.

What is the role and future of the handwritten or printed document?

They will continue to be around for a while, although they will gradually be replaced by technologies as they catch up with some of the aspects of paper. The “Paperless Office” is not here — that’s why we are talking about the “Less Paper Office”. Paper still has some affordances that are unique will still be around for a while, but we’ll — gradually — use it more responsibly and sustainably.

For example, color and personalized documents are much more powerful — take the example of the transpromo (“transactional + promotional”) document — personalized ads, personalized images, based on customer knowledge. So casual, short-lived printing will decline, while specific printing areas will continue to grow (where paper has strong impact).

Handwriting will gradually disappear, as interfaces become richer — multi-touch, speech, etc … But it will take a while, and in the meantime, we will probably might see some “augmented” paper readers that will support handwriting.

In the movie “Avatar,” a lab technician is seen using a large, transparent, wall-mounted computer screen. She waves her hand across the screen, brushes a cluster of text and images off it, and onto a portable unit — little more than a square of plexiglass. She then walks off with the portable unit carrying the transferred data. That little square of computerized plexiglass is a long way from the iPad. What’s the ultimate document reader/interface of the future? What will it be able to do?

How about … a piece of paper, flexible, foldable, lightweight, low to no power consumption, but that would have all the current and future affordances of a mobile device (wireless/motion sensing/full color and video/write capabilities with handwriting, shape or gesture recognition/multi-touch/tactile feedback/projection/augmented reality?).

There is a reason why the paper survived so many years as our “preferred” document format, but it does have many shortcomings. The best of both worlds would definitely enable the ultimate document reader.

Is data visualization a stepchild of object-oriented programming? Will documents become more fluid as data become more portable and flexibly represented? (For example: A newspaper article with a set of statistics that can be viewed as pie and bar graphs, maps, or flow charts, with each type of visualization able to be independently bookmarked, excerpted, cited in other documents, and maximized to access more detail.)

Partly addressed in my previous statement. The key here is customizing how data is represented to the user, based on its reading device, and preferences.

Francois Ragnet Deconstructs the Document

You may also like

Redemption at the high school reunion

The secret language of London's grim Taboo

In Singapore