Expert Opinion 06: 11 features to look for in a data quality management tool

It’s crazy to think, but 90 percent of the World’s data has been generated in the last two years alone. And, with the current rate of 2.5 quintillion bytes of data being produced each day, it is unsurprising that the issue of data quality has become a prevalent one which businesses can no longer ignore.

This has subsequently led to an unprecedented increase in demand for data quality tools with ever-evolving capabilities.

In the early days of data warehousing, acute data quality issues drove the need to standardise and cleanse inconsistent, inaccurate or missing data values. At the time, the state-of-the-art data quality management tools mostly performed address standardisation and correction coupled with data de-duplication.

Conventional wisdom has since transitioned from trying to correct data to proactively preventing errors from being introduced. Instead of focusing on data cleansing, today’s tools support practices for data governance and stewardship while operationalising data quality assessment.

David Loshin, recognised globally as an information management industry thought leader, highlights the 11 most important capabilities to look for when considering investing in a data quality management tool for your organisation:

1. Collaboration features: Team members may need to share data element definitions and data domain specifications, as well as agree on semantic meanings, so that business logic pertinent to your own organisation and vertical(s) are the fundamental basis of your tool.

2. Data lineage management: Data quality management tools can enable the organisation to map the phases of the information flow and document the transformation applied to data instances along the flow. (EiB AppStudio is a our tool which empowers you to create such data flows.)

3. Metadata management: The tool should provide a repository to document both structural and business metadata, including data element names, types, data domains, shared reference data and contextual data definitions that scope the semantics of data values and data element concepts. This metadata environment should allow multiple data professionals to submit their input about these definitions to facilitate harmonious utilisation across the enterprise.

4. Data profiling for assessment: Data profiling capabilities in data quality management tools can perform statistical analysis of data values to evaluate the frequency, distribution and completeness of data, as well as the data’s conformance to defined data quality rules. Data profiling tools can also be used to assess data quality.

5. Data profiling for monitoring: Data profiling tools enable users to define the validity rules against which data sets can be tested. They also ensure that the data profiling facility is flexible enough to integrate proactive data quality validation to monitor compliance with defined expectations.

6. Implementation of data controls: Data controls are operational objects that can be integrated directly into the information production flow to monitor and report on data compliance or the violation of defined rules. Many tools can automate the generation of data quality controls for application integration.

7. Data quality dashboarding: Data stewards are tasked with monitoring data quality, taking action to determine the root cause of flagged issues and recommending remediation actions. A data quality dashboard can aggregate the status of continuously monitored data quality rules, as well as generate alerts to notify data stewards when they need to address an issue.

8. Connectivity to many data sources: As the breadth, variety and volume of data sources expand, the need to assess and validate data sets originating from both within and without the organisation has become acute. Look for data quality management tools that can connect to a wide selection of data source types, both from a system — e.g., RDBMS vs. NoSQL database — and a platform — e.g., on premises vs. cloud — basis.

9. Data lineage mapping: Data lineage mapping scans the application environment to derive a mapping of the data production flow and to document the transformation.

10. Identity resolution: Identity resolution is the process of linking various records and is the main engine for record de-duplication, which can enable some aspects of data cleansing.

11. Parsing, standardisation and cleansing: Last, but not least, there is always going to be a need for the traditional aspects of data quality, such as address standardisation and the correction of known invalid values.

Don’t limit the evaluation to only these features, though. As data quality management becomes more collaborative, it will be important to document processes and allow for a collaborative review to ensure the auditability of the data quality procedures.

Reasonable data quality management tools do not only support the operational aspects of data stewardship, but they can also increase awareness of the value of high-quality information and motivate good data quality practices across the organisation.

Author: David Loshin
Original Source: TechTarget