In the collective imagination, the computers are busy merging into one grand, expansive database filled with minutiae about those pesky, emotive humans so that the machines will be ready for Sarah Connor. The database administrators and programmers know that the reality is more than a little bit creakier than this image -- even though they might use the image to pry some funding if they see a glint of malice in the eyes of the pointy-haired bosses.
Denodo is a Java-based collection of tools aimed at making it easier to start building SkyNet with just a click of a few buttons on some Web forms. The tendrils reach out over the Net, suck in the information in a wide range of formats, and then reformat it into an equally wide range of ways to store the data. Denodo's literature calls this a "mashup" because the term is trendy, but the tool was born long before the word, and it does much more than people usually associate with the term. The system will speak basic XML and Web services, but it will also reach into e-mail boxes and actually start to parse and attempt to understand the text inside them.
The main market for the product is the enterprise developer who needs to synthesize something slick from a collection of legacy systems that probably live under the control of data barons living on different sides of the battle lines in interdepartmental feuds that may go back centuries. Denodo can pull apart HTML and suss out information in e-mails, all without waiting for a recalcitrant team to find the resources to come to your assistance.
Mix and mash
There are already a number of interesting products on the market for creating these mashups. (See InfoWorld's reviews of JackBe Presto, Nexaweb Enterprise Web 2.0 Suite.) Denodo emphasizes that it can do much more than just suck in Web services, mash them up, and spit them out as XML. It can also parse some data, take apart HTML, and even try to clean it up a bit along the way in what Denodo calls a "transformation and enrichment layer."
Denodo feels like a product that lives up to wearing the number 4.1 next to its name. It is a fully functional collection of data-moving tools that has built up over several generations. There are a number of smart extensions that someone had the bright idea to add to the platform over the years. At some time, there was a programmer who needed to mash up some mixture of data from a tab-delimited data file, mix in some calls to a JDBC server, and then store the result in a text-searchable database. All of these connectors and more are already implemented and ready to load in the Virtual DataPort (VDP) layer. The tool for taming the various data sources, the Virtual DataPort lets you page through a number of tables of data.
At the same time, the product does feel like it's grown a bit shaggy with all of these clever additions, leaving us with a nomenclature that seems a bit complex. Most data from traditional sources comes from the VDP, but many Web-based sources are scraped by the ITPilot collection for reaching out to Web sites and pulling them into the service. The data ends up in the Aracne indexing and search engine. All of the names for tendrils get a bit confusing, and it might make sense to produce a unified naming convention if it could be done in a way that wouldn't annoy the existing users looking to upgrade.
The ITPilot layer is elaborate and powerful, offering a pool of browsers that will suck in information on a schedule. The data comes in as HTML and leaves as entries in a local database. Much of the work is specified using a visual programming language filled with icons for tasks like looping through a set of