For developers of RIAs (rich Internet applications), Adobe's announcement that Google and Yahoo will soon be able to index text within Flash movies should come as welcome news. Until now, Flash files have been black boxes; with these binary files, search indexers could no more extract textual information from them than from JPEGs or PNGs.
This first stab at Flash search still sounds somewhat primitive, but it raises an issue of importance to all Internet application developers. Given the growing number of data types and file formats being transmitted over HTTP and the increasing complexity of the applications that make use of them, is today's Web really still the Web? Or is it morphing into something else? How can we ensure that today's Web apps offer enough capabilities and flexibility to make Web 2.0 worthy of its name?
When Tim Berners-Lee first envisioned the Web in the 1980s, he saw it as primarily an information storage and retrieval system, based on the concept of hypertext. Web documents were fundamentally text, embellished with a markup language (HTML) that described how the text should be formatted and how each document was linked to others elsewhere on the Web.
Those embellishments were really only recommendations, however. How a page actually looked depended on the browser, client, or device on which you viewed it. Each document had a unique URL that described where it could be found, and not much else. Once you'd retrieved it, what you did with it was up to you and your software. If you wanted to, you could even view the raw source to see how its author encoded it.
Today, that early vision is gradually being replaced by a much more complex model. The static HTML document is largely a thing of the past. In its place is a diverse range of technologies, each of which falls somewhere along a continuum that spans from the flexibility and openness of Web 1.0, all the way to a closed, binary-only paradigm that's more akin to traditional desktop software.
This trend reaches its utmost with content delivered for plug-ins, such as Flash or Microsoft Silverlight. Applications written for these platforms don't resemble HTML in the slightest. They are binary blobs, little different from executable programs built for a desktop OS. You cannot view the underlying code in its raw form. Though Google and Yahoo may be able to extract text data from those binaries, deep linking to specific items is impossible.
These distinctions invite important questions. Is it still the Web if it's not really hypertext? Is it still the Web if you can't navigate directly to specific content? Is it still the Web if the content can't be indexed and searched? Is it still the Web if you can only view the application on certain clients or devices? Is it still the Web if you can't view source?
Equally important, if today's RIAs no longer resemble what we would call the Web, then is shoehorning those applications into the Web's infrastructure really the right way to go? If application developers feel limited by the constraints of standards-compliant browser technologies, should they really be targeting their applications for the browser? Or is the problem that the client platforms simply aren't evolving fast enough to meet our needs? The debate on these issues is only just beginning.