No matter how big data gets, it's not very useful if it's inaccurate--that is, if information is invalid, incomplete or just plain wrong. In an attempt to address this issue, open source Big Data vendor Talend and Melissa Data, which develops data-quality solutions, have partnered to improve data validity. Could this be the next big trend in the Big Data world?

By now, it's easy enough for most organizations to establish Big Data infrastructure. There's no shortage of products, both open- and closed-source, that simplify the deployment of the hardware and software needed to store and manage large amounts of information. Talend's Open Studio for Big Data and Platform for Big Data are two examples.

But since ensuring the validity and accuracy of large amounts of data is often trickier than simply storing the information, data quality has become a significant area of focus for Talend lately. A couple of months ago, the company introduced data-profiling tools designed to detect data redundancy, incompleteness and inconsistency within Hadoop-based storage systems.

Now, the agreement with Melissa Data, which is headquartered in California and focuses on the seemingly narrow but nonetheless vital task of ensuring the accuracy of contact information, will integrate code for validating mailing and email addresses and catching fraud attempts into Talend's enterprise data-quality platform.

According to Talend representatives, "this partnership couples the extensive address resolution service of Melissa Data with the comprehensive and easy-to-use profiling, matching and cleansing technology of Talend, providing address data quality that is essential to almost every enterprise business."

Big Data Trends

On its face, address validation is hardly the most interesting topic in the world of Big Data. Making sure that zip codes match street names, or catching typos in email addresses, is hardly the type of thing I'd want to spend my day thinking about. But that's kind of the point of the partnership between Talend and Melissa Data, which will make it easier to automate and monitor address validation across large sets of information.

And the task does have implications for areas that are more interesting and relevant, like security. In an environment where Big Data deployments and platforms arguably are evolving more rapidly than the security tools and practices necessary to make sure information remains secure and resilient against misuse--including misuse by parties engaged in fraud--address validation, tedious though it might be, is an important factor for organizations to consider.

More broadly, data quality in general is likely to emerge as a significant point of focus as Big Data technology matures. Now that many organizations have already established Big Data infrastructures, the task of making sure data is as accurate and usable as possible will become a major component of increasing the value of Big Data solutions, providing further partnership opportunities for vendors throughout the channel.