jump to navigation

Today’s Linkedin Discussion Thread: Enterprise Data Quality April 28, 2009

Posted by Peter Benza in Data Analysis, Data Elements, Data Governance, Data Optimization, Data Processes, Data Profiling, Data Quality, Data Sources, Data Standardization, Data Synchronization, Data Tools, Data Verification.
Tags: ,
add a comment

Here is my most recent question I just added to my Linkedin discussion group = Enterprise Data Quality.

QUESTION: What master data or existing “traditional” data management processes (or differentiators) have you identified to be useful across the enterprise regarding data quality?

MY INSIGHTS: Recently, I was able to demonstrate (and quantify) the impact of using an NCOA updated address for match/merge accuracy purposes when two or more customer “names and addresses” from three disparate source systems were present. The ultimate test approach warrants consideration especially when talking about the volume of customer records for big companies today number “hundreds” of millions of records. It is ideal to apply this test to the entire file not just a sample set. But, we all know today its about: money, time, value, resources, etc.

For testing purposes, I advised all individual customer address attributes were replaced (where information was available) with NCOA updated addresses and then loaded and processed through the “customer hub” technology. If you are not testing a piece of technology, then constructing your own match key or visually checking sample sets of customer records before and after is an alternative. Either way, inventory matches and non-matches from the two different runs – once with addresses (as-is) and once with addresses that leverage the NCOA information.

My goal was to establish a business process that focused on “pre-processing customer records” using a reliable third party source (in this case NCOA) instead of becoming completely dependent on a current or future piece of technology that may offer the same results, especially when the methodology (matching algorithms) are probalistic. My approach reduces your dependency, as well, and you can focus on “lift” the technology may offer – if your are comparing two or more products.

Where as, inside a deterministic-based matching utility (or off-the-shelf solution) adding extra space or columns of data to the end of your input file to store the NCOA addresses will allow you to accomplish the same results. But, for test purposes, the easier way may be to replace addresses where an NCOA record is available.

Remember, based on the volume of records your client may be dealing with, a pre-process (business process) may be ideal, rather than loading all the customer names and addresses into the third party customer hub technology and processing it. Caution: This all depends on how the business is required (i.e. compliance) to store information from cradle to grave. But, the rule of thumb of the MDM customer hub is to store the “best/master” (single customer view record) with the exception of users with extended search requirements. The data warehouse (vs. MDM solutions) now becomes the next challenge… what to keep where and how much. But, that is another discussion.

The percentage realized in using the updated customer address was substantial (over 10%) on the average based on all the sources factored into the analysis. This means several 10’s of millions of customer records will match/merge more effectively (and efficiently) followed by the incremental lift – based on what the “customer hub” technology enables using its proprietary tools and techniques. This becomes the real differentiator!

Human Inference – an international data quality solution provider February 11, 2008

Posted by Peter Benza in Data Governance, Data Hygiene, Data Integrity, Data Management, Data Metrics, Data Processes, Data Profiling, Data Quality.
add a comment

From the website:

Human Inference discovered that to reach the desired results, mathematical logic is not sufficient. The knowledge about the language and culture of a country was necessary as well. Human Inference proved to be right, since today the largest companies of the world are using our knowledge-based software to improve the quality of their data.


Innovative Systems, Inc – data quality assessment tool January 14, 2008

Posted by Peter Benza in Data Assessment, Data Errors, Data Governance, Data Hygiene, Data Metrics, Data Processes, Data Profiling, Data Tools.
Tags: , ,
1 comment so far


The Innovative Data Quality Assessment provides a quick and economical evaluation of the quality of your customer information. It identifies areas where your information may be enhanced or improved, and quantifies the impact of the defined data quality issues in terms of costs, customer service, lost revenues, etc. It also benchmarks your organization’s data quality against industry standards, showing how your data quality compares to others in your industry.

Summit 2008 – San Francisco January 10, 2008

Posted by Peter Benza in Data Accuracy, Data Governance, Data Integrity, Data Metrics, Data Processes, Data Quality, Data Stewardship, Data Strategy, Data Templates, Data Verification, Data Warehouse.
Tags: , ,
add a comment

If you have not attended a Summit then mark your calendars for:

CDI-MDM Summit Spring 2008

Please post and share your comments about this upcoming summit or if you have not attended and want to learn more then link using the above reference.

MDM Accelerator® by Zoomix January 9, 2008

Posted by Peter Benza in Data Accuracy, Data Aggregates, Data Analysis, Data Assessment, Data Consolidation, Data Dictionary, Data Formats, Data Governance, Data Hygiene, Data Integration, Data Management, Data Metrics, Data Processes, Data Profiling, Data Quality, Data References, Data Sources, Data Standardization, Data Stewardship, Data Synchronization, Data Templates, Data Tools.
add a comment

To learn more about or post your comments about MDM Accelerator®

by Zoomix.


Teradata – Master Data Management January 9, 2008

Posted by Peter Benza in Data Assessment, Data Consolidation, Data Dictionary, Data Governance, Data Hygiene, Data Integration, Data Management, Data Metrics, Data Processes, Data Profiling, Data Quality, Data Standardization, Data Stewardship, Data Strategy, Data Templates, Data Tools, Data Types.
add a comment

To learn more about Teradata and their MDM solution offering:


What data variable(s) are useful to determine when a customer record should be classified as active or inactive? August 25, 2007

Posted by Peter Benza in Data Analysis, Data Management, Data Metrics, Data Mining, Data Processes, Data Research, Data Variables.

Be the first to author a comment on this subject.

SOA triggers innovation for your enterprise August 18, 2007

Posted by Peter Benza in Data Architecture, Data Processes, Data Tools.
add a comment

Say goodbye to those big applications and hello to function-specific services that IT departments can incorporate into their operating architecture with greater ease. 

This flexibility and re-design by vendors to package their solutions into bit-sized services also triggers innovation.  For example, as business requirements change both IT departments and vendors can be more responsive by offering a new service instead of waiting until the next product release.  IT can also win by telling management they are leveraging their current software asset (investment) with vendor XYZ.

How can data profiling and a meta data repository be leveraged together? August 17, 2007

Posted by Peter Benza in Data Modeling, Data Processes, Data Profiling.
add a comment

Many times the same term used by one department means something totally different to another department.  This can prove to be a challenge as organizations continue to centralize all their customer data into one location.  You may not be able to resolve all the different name variations used by each department, but assembling all the pieces and documenting them in one place is a must.  It may become necessary to follow-up with the appropriate decision makers to resolve any discreptencies.

So, it becomes mission-critical to compile “data about your data” and store it in a meta data repository, plus include some other key attributes about the data source, about each variable, its range of values, record length, and so on.  Ultimately, the data elements need to be analyzed and merged into a single classification system based on all relevant data sources from across the enterprise.

This (meta data) information will also become valuable guide to validate other data-specific activities, such as: customer data modeling, match/merge logic, and even for QC purposes during the integration/execution phase of storing the resulting customer information in one location.

Meta Data: http://en.wikipedia.org/wiki/Metadata

What are some examples of how a data system can become unstable? August 17, 2007

Posted by Peter Benza in Data Errors, Data Governance, Data Management, Data Processes, Data Synchronization.
add a comment

Be first to author an article on this subject?

How do you prevent data errors in your database today? August 16, 2007

Posted by Peter Benza in Data Errors, Data Hygiene, Data Processes, Data Sources, Data Templates, Data Verification.
add a comment

Data errors can be reduced but not totally eliminated, so be realistic.  First consideration must be given at point of entry and depending of the size of your organization this could be many.  Once your data is consumed, a number of other places should be considered to monitor data errors, such as: data convertion, data preparation, data migration, data integration, data reporting, data analysis, and finally when it is consumed and displayed for use in a dashboard.

Collectively, once you document where most of these errors are orginating from – then and only then will you be able to classify data errors given the entire end to end process from point of entry to using the data in its original or transformed state in a report, analysis, or dashboard.

Now, that you have compiled all these data errors (specific to your organization) you can begin to feed some/most of these findings back into your data quality, data governance, and data management frameworks.