Yahoo's reported plan to spin off its Hadoop engineering unit into a separate company, should it happen, could spur more competition in the already-growing field of providing support for this data processing framework, observers said.
On Wednesday, the Wall Street Journal reported that Yahoo is mulling the idea of setting up its Hadoop development team as a new standalone company, one that would continue to develop the software and offer Hadoop-related consulting services.
"This is another good sign that the Hadoop space continues to grow," said Justin Borgman, co-founder and CEO of Hadapt, which provides tools for bridging Hadoop with data warehouses. "This will put the pressure on other Hadoop companies. The more competition the better."
Hadoop is "the biggest movement in enterprise software in years," Benchmark Capital partner Rob Bearden told the Wall Street Journal. Forrester Research senior analyst James Kobielus has estimated that the market for Hadoop products and services could grow to US$1 billion per year.
Hadoop can process data that resides across multiple servers, making it suitable for analyzing larger amounts of information than the typical data warehouse can handle.
Yahoo has been instrumental in developing the Hadoop data processing platform, and it devotes considerable engineering resources to the project. Yahoo uses the technology for the data processing and analysis needed to personalize content and advertisements for users. Other giant Internet services, such as eBay, Facebook and Twitter, also use the technology.
"Hadoop is the platform we run the company on. It's at the core of what we do," said Todd Papaioannou, Yahoo's vice president of cloud computing, in an interview with IDG last month. Yahoo did not immediately respond to a request for comment for this story.
Doug Cutting, the creator of Hadoop, joined Yahoo in 2006 to help the company develop the technology. He is now with Cloudera, which distributes a version of Hadoop. The Apache Foundation oversees Hadoop, which is an open-source project, though Yahoo contributes a sizeable portion of the new code for the project.
Designating its Hadoop team as its own entity makes sense for Yahoo, Hadapt's Borgman noted. Yahoo is primarily an Internet media company, and the market for Hadoop would be enterprises, a market Yahoo does not currently cater to.
"Maybe they feel that by spinning it out in a different company, it will have a better opportunity to pursue a different line of business," Borgman said.
Should the spin-off take place, the new entity would be one of a growing number of companies supporting Hadoop. Cloudera offers its own distribution of Hadoop. DataStax offers a version paired with its Cassandra data store. IBM has packaged Hadoop into its analytics offerings, and even used the framework as part of its Watson Jeopardy-playing supercomputer.
Kobielus predicted that the spin-off entity would not create its own distribution of Hadoop, given that Yahoo discontinued its own distribution earlier this year. Instead the new company would focus on consulting, professional services and systems integration for enterprises.
The possible spin-off could also trigger further investment in the technology, Kobielus said. Venture capitalists might fund more Hadoop-based startups, which traditional enterprise data warehouse (EDW) vendors might snap up for themselves.
"Forrester strongly expects that many of the startups already in the Hadoop market will be acquired by established EDW and analytics vendors who need to bootstrap their efforts in this new arena," Kobielus said.
Companies such as Oracle, Teradata, Microsoft and Hewlett-Packard could be taking a hard look at Hadoop-focused startups such as Cloudera, Karmasphere, Datameer, Hadapt and HStreaming, Kobielus wrote.
Despite all this interest, work still needs to be done to bring Hadoop to enterprise IT departments, which may not have the engineering expertise to implement the technology to run the software in-house, Kobielus noted. It has no reference framework providing the core specifications, APIs (application programming Interfaces) or code base. It still lacks some basic features, such as real-time analysis. Also, there are not yet enough modeling tools that could translate the data integration logic in Hadoop components such as MapReduce and Pig into terms that can be understood by business intelligence analysts.
Despite these limitations, Hadoop is "extremely close" to be being ready for the enterprise, Borgman said. "We're on the cusp of a really interesting new world of big data analytics."