One of the more prosaic parts of data warehousing is, well, getting the data into the warehouse.
This has long been handled by vendors that are expert in the field of extract, transform and load (ETL). Even there, innovation focused more on the problem of transforming the data. Loading the data seemed a piece of cake by comparison.
That is, until business intelligence (BI) and analytics started becoming a round-the-clock affair. Also, today's biggest BI users -- banks, telecommunications providers, Web advertisers -- operate data warehouses larger than a petabyte in size and import huge swaths of data -- 50TB of data per day, as in the case of one of Teradata Inc.'s customers.
BI and ETL vendors are responding. The past several months have seen a number of start-ups and lesser-known firms touting screaming-fast data-loading speeds, both in the lab and in the field.
Database start-up Greenplum Inc. said it has a customer routinely loading 2TB of data in half an hour, for an effective throughput of 4TB per hour.
Rival database start-up Aster Data Systems Inc. claimed that its nCluster technology can enable customers to reach almost 4TB (specifically, 3.6TB) per hour.
Data-integration vendor Syncsort Inc. said third-party-validated lab tests show its software can load 5.4TB of data into a Vertica Systems Inc. columnar data warehouse in under an hour.
Not to be outdone, semantic data integration start-up Expressor Software Corp. claimed that in-house tests show its data-processing engine able to scale to nearly 11TB per hour.
"If they are really performing at this rate, it's quite significant and really impressive," said Jim Kobielus, an analyst at Forrester Research Inc., since "anything above a terabyte per hour is good."