French open-source data integration vendor Talend Monday unveiled its data-profiling application, which will allow companies to assess their data quality as a key part of data integration projects.
In an announcement Monday, the company claims that its Open Profiler application is the first open-source data profiler to be released to the marketplace.
A data profiler allows users to clean up data by getting rid of multiple entries that might be slightly different, as well as resolving conflicting data such as missing zip codes, incomplete addresses or wrong phone numbers that can lead to multiple mailings to the same customer, the company said.
Yves de Montcheuil, vice president of marketing for Talend, said the company built an open-source profiler to fill a void in the marketplace. For data-intensive businesses, an open-source profiler allows a company to more easily customize and modify the code to meet its own needs, compared to proprietary products. And because it is free to download and use, "You can start looking at this without having a budget and see how it works," he said.
Talend will release related data cleansing products later this US summer, he said. "Our customers doing this integration need that data quality" provided by a data-profiling application, he said. "If you do that integration without knowing what you have, it's like driving blind in the snow."
David Loshin, principal analyst at Knowledge Integrity, said Talend's new application is aimed at what is becoming an increasingly popular niche in data integration work. Many larger companies, including Informatica, IBM, Business Objects and Oracle, have been acquiring data-profiling vendors or built their own profilers in the past few years, he said.
"It's about time that we're getting some activity in the open-source community with respect to the kinds of tools they're putting out," Loshin said. "It is a boon to the data community to have access to an open-source data profiler."
Data profiling is an empirical analysis of a data set, relying on frequency distribution analysis for anomalies and validation of data, and looking for patterns, he said. "If you have a data set and don't know what's in there, you can profile it and learn more about what you have," he said, by highlighting anomalies, or errors, in the data. This helps data quality management because it enables the analyst to focus on what might be a deviation and then sort it.
Talend's Open Profiler "provides the initial piece of critical technology that anybody doing data integration needs," Loshin said. "Their long-term development plan looks to bring it up to snuff with best-in-class proprietary data profiling." Talend Open Profiler is available for free download under a GPL license at the company's Web site. Support is available under fee-based contracts.
Talend's core open-source data-integration application is Talend Open Studio. The vendor also offers Talend Integration Suite, a subscription-based service using Talend Open Studio, and Talend On Demand, a software-as-a-service (SaaS) open data integration product.