jump to navigation

Data visualization tool – a must watch video! August 19, 2007

Posted by Peter Benza in Data Sources, Data Tools, Data Visualization.
add a comment

Wow, check this data visualization tool out created by a non-for-profit organization.  I think we will all be seeing more of this tool to illustrate global data and other publically available data sources. 

Take the 30 minutes to watch this demo and visually watch these key data trends unfold before your eyes. 

http://sjamthe.wordpress.com/2007/06/14/gapminderorg-data-visualization-techniques/

www.gapminder.org

Data quality and plotting customer address data on a map August 19, 2007

Posted by Peter Benza in Data Analysis, Data Hygiene, Data Integration, Data Metrics, Data Profiling, Data Quality, Data Tools.
add a comment

Consider the insights and knowledge your organization will gain about the quality of its customer name/address data prior to centralizing all the desparate data sources into one location.  Here is a actual slide deck I prepared a few years ago using the output from my analysis to illustrate how maps and data profiling can assist in assessing data quality. 

Great definition: about data integrity August 18, 2007

Posted by Peter Benza in Data Hygiene, Data Integrity.
add a comment

I came across this definition from the Data Lever website about data integrity.

“Data integrity involves more than just data cleansing, deduplication, householding, CASS certification, and geocoding. You need information that is consistent, complete, accurate and relevant. To get there, you need an enterprise-class data quality solution that can handle correction, validation, and enhancement of all your data—no matter where is comes from. You need it to be easy to implement, efficient to operate, at a price you can afford.”

www.datalever.com 

SOA triggers innovation for your enterprise August 18, 2007

Posted by Peter Benza in Data Architecture, Data Processes, Data Tools.
add a comment

Say goodbye to those big applications and hello to function-specific services that IT departments can incorporate into their operating architecture with greater ease. 

This flexibility and re-design by vendors to package their solutions into bit-sized services also triggers innovation.  For example, as business requirements change both IT departments and vendors can be more responsive by offering a new service instead of waiting until the next product release.  IT can also win by telling management they are leveraging their current software asset (investment) with vendor XYZ.

How can data profiling and a meta data repository be leveraged together? August 17, 2007

Posted by Peter Benza in Data Modeling, Data Processes, Data Profiling.
add a comment

Many times the same term used by one department means something totally different to another department.  This can prove to be a challenge as organizations continue to centralize all their customer data into one location.  You may not be able to resolve all the different name variations used by each department, but assembling all the pieces and documenting them in one place is a must.  It may become necessary to follow-up with the appropriate decision makers to resolve any discreptencies.

So, it becomes mission-critical to compile “data about your data” and store it in a meta data repository, plus include some other key attributes about the data source, about each variable, its range of values, record length, and so on.  Ultimately, the data elements need to be analyzed and merged into a single classification system based on all relevant data sources from across the enterprise.

This (meta data) information will also become valuable guide to validate other data-specific activities, such as: customer data modeling, match/merge logic, and even for QC purposes during the integration/execution phase of storing the resulting customer information in one location.

Meta Data: http://en.wikipedia.org/wiki/Metadata

What kind of data references are being bolted-on to enhance record matching inside the customer database? August 17, 2007

Posted by Peter Benza in Data Elements, Data References, Data Strategy, Data Verification.
add a comment

Organizations are turning to compiled reference data to compliment the match/merge/optimize routines inside their customer data hub.  A score/rank is also being pre-assigned (appended) to each customer record to make this process easier when it comes to building match-logic for use during the file build process.

A good example of this is aggregating surname into various geographic levels – block group, census tract, zip code, county, and so on.  The resulting surname statistics by geography are used as part of the overall algorithm applied during the integration/update process to improve the decision making process which brings two or more customer records together, referred to as a household.

Note: Surname is only one data element – others exist and vendors in the informations services industry have packaged this concept into licensed modules for use in organizations master data management landscape. 

What kind of long-term strategy to data quality can organizations fund? August 17, 2007

Posted by Peter Benza in Data Governance, Data Management, Data Quality, Data Strategy.
add a comment

Be first to author an article on this subject?

How does your data quality tool handle unstructured data? August 17, 2007

Posted by Peter Benza in Data Mining, Data Quality, Data Tools.
add a comment

Be first to author an article on this subject.

What are some examples of how a data system can become unstable? August 17, 2007

Posted by Peter Benza in Data Errors, Data Governance, Data Management, Data Processes, Data Synchronization.
add a comment

Be first to author an article on this subject?

What non-for-profit associations are there to learn more about global data quality issues? August 17, 2007

Posted by Peter Benza in Data Hygiene, Data Quality, Data References.
add a comment

Be first to author an article in this area!

What kinds of international data templates exist today – out of the box? August 17, 2007

Posted by Peter Benza in Data Templates.
add a comment

Be first to author an article on this topic!

What are some different ways your organizations data architecture can be illustrated? August 16, 2007

Posted by Peter Benza in Data Architecture, Data Security.
add a comment

One of the more common ways is with a data flow diagram.  A data flow diagram will allow the end-user to visualize the flow of data into four major sections – when it enters, when it is processed, when it is stored, and finally – utilized in the system.

This collection of diagrams represent the overall data architecture of your organization and useful when consolidating into a customer-centric single view.  Data architect’s are typically the creator’s or owners of this kind of information.  As a side note, the knowledge represented in a data flow diagram is sometimes summarized in a concept map.  A concept map is basically a data model of a data model.

You can also begin to see during this exhaustive excercise your organizations Intellectual Capital starting to being assembled and maybe even the foundation for some key data security policies and procedures to handling the same data.

Deciphering between data variables and data elements? August 16, 2007

Posted by Peter Benza in Data Consistancy, Data Consolidation, Data Elements, Data Formats, Data Standardization, Data Templates.
add a comment

Here are two data variables that require some special attention or you just might “age” your customers too soon, too late, or not at all. 

Exact age is a data variable and is typically stored as a whole number representing a customer’s age.  In this form it is a very powerful (and predictive) data variable and is used as one of the more commonly used variables to discriminate responders from non-responders. 

Exact age in this case can’t be broken down into any smaller data elements.  Okay, so know you understand the difference, but is this good enough given how you plan to use this data variable for target marketing purposes.

Exact age does have some limitations.  What about maintaining this particular variable in your customer data warehouse.  If left alone in its current format it (exact age) becomes an operational nightmare.  A more common and efficient way is creating a second data variable named (date of birth), and include three data elements month, day, and year of birth.

Remember, some data variables may have specific data elements within them – such as a phone number, street address, zip code, etc.  The more you examine each of the data variables in your database – you will begin to uncover all the potential options. 

Is your data accurate? August 16, 2007

Posted by Peter Benza in Data Accuracy.
1 comment so far

Data accuracy is the result ensuring the values for each data element being canvassed is true.  The level of precision associated with that variable is also another indicator to determine data accuracy.  And on a more system or enterprise level no system or application errors have been logged or reported.

What data quality dimensions can statisticans impact by using a data profiling tool. August 16, 2007

Posted by Peter Benza in Data Accuracy, Data Completeness, Data Profiling, Data Quality, Data Tools.
add a comment

Two data quality dimensions that statisticans can play a role in is data accuracy and data completeness.  A data profiling tool comes in handy to facilitate the actual research required by the organization. 

What data profiling tool does your organization use?

What other data quality dimensions can be analyzed? 

How is data governance and compliance being incorporated into your overall data management practices? August 16, 2007

Posted by Peter Benza in Data Governance, Data Management.
add a comment

Be the first to author an article on this topic.

How do you prevent data errors in your database today? August 16, 2007

Posted by Peter Benza in Data Errors, Data Hygiene, Data Processes, Data Sources, Data Templates, Data Verification.
add a comment

Data errors can be reduced but not totally eliminated, so be realistic.  First consideration must be given at point of entry and depending of the size of your organization this could be many.  Once your data is consumed, a number of other places should be considered to monitor data errors, such as: data convertion, data preparation, data migration, data integration, data reporting, data analysis, and finally when it is consumed and displayed for use in a dashboard.

Collectively, once you document where most of these errors are orginating from – then and only then will you be able to classify data errors given the entire end to end process from point of entry to using the data in its original or transformed state in a report, analysis, or dashboard.

Now, that you have compiled all these data errors (specific to your organization) you can begin to feed some/most of these findings back into your data quality, data governance, and data management frameworks.

How complete is your data? August 15, 2007

Posted by Peter Benza in Data Completeness, Data Governance, Data Quality, Data Research.
add a comment

Data completeness is contingent upon first knowing the target population* relative to the number of missing data elements (bad values) to good values by data element. 

Consider analyzing over time and set up on a scheduled basis missing value reports (better yet aggregate datasets) to study over time data completeness patterns.  These findings might also reveal other data governance processes, policies, and standards in your organization for consideration.

It is advised to include a statistical analyst early on in outlining this process in order to help define data completeness specific to your organization – past, present, and future.

*target population could be anything from your customer name/address customer master database to product-specific datasets and all their associated attributes.

What is a good approach to start measuring data currency as it relates to my organization? August 13, 2007

Posted by Peter Benza in Data Currency, Data Governance, Data Metrics, Data Sources, Data Synchronization.
add a comment

Frequency is a good start.  Compile the lag time between updates representing all the different “content” (data) that accompanies each software application across your organization both internally generated and externally supplied by third parties. 

Compare this to your overall file build process, make adjustments, and update your data synchronization standards to reflect any new data sources.  Warning: Be sure to consider the impact these changes may have on other departments, especially marketing.

Slick website that links you to data sources August 13, 2007

Posted by Peter Benza in Data Sources.
add a comment

Check out this website: http://graduateresearch.wordpress.com/tag/data-sources/

Go ahead and post other data sources website you might like to share with other enterprise data quality enthusiasts.

What type of statistical modeling approach would you recommend for those times when your response file is under 1,000 buyers? August 13, 2007

Posted by Peter Benza in Data Analysis.
add a comment

Be one of the first to author an article in this category!

Do you know any data hygiene solution providers that include non-postal address ranges with their reference data? August 13, 2007

Posted by Peter Benza in Data Hygiene.
add a comment

Group 1 Software is a good example of one company who licenses a advanced geo-coding solution offering that goes beyond using just the USPS supplied (Zip+4) street ranges. 

A good example of a non-postal street address are those streets in rural areas across America that have not yet been converted to meet the 911-compliant standards.  In order words, not all addresses in rural areas have a house number associated with their address.  Also, note a PO Box styled address is typically used in small communities where individuals/businesses will go to pick up their mail daily.

Other companies also providing non-postal street addresses are: Trillium Software, First Logic, and Proxix.

What are some of the most popular business related data elements used today for data quality purposes? August 13, 2007

Posted by Peter Benza in Data Elements.
add a comment

Be one of the first to author an article in this category!

Is Data Migration and ETL the same thing? August 13, 2007

Posted by Peter Benza in Data Migration.
1 comment so far

Be one of the first to author an article in this category!

Peter Benza – 1984 graduate of the direct marketing educational foundation – creates enterprise data quality weblog August 13, 2007

Posted by Peter Benza in Data Elements, Data Governance, Data Integrity, Data Management, Data Mining, Data Optimization, Data Profiling, Data Quality, Data Stewardship, Data Strategy, Data Tools, Data Variables, Data Visualization.
add a comment

What topics comprise of Data Governance? August 13, 2007

Posted by Peter Benza in Data Governance.
add a comment

Be one of the first to author an article in this category!

Does data aggregation and data consolidation mean the same thing? August 13, 2007

Posted by Peter Benza in Data Consolidation.
add a comment

Does anyone have a point of view on the similarities or differences between data consolidation and data aggregation?

Data Quality: Customer, Product – What Else? August 12, 2007

Posted by Peter Benza in Data Quality.
add a comment

At first data quality tools concentrated primarily on customer name/address elements.  Now, vendors and organizations are using data quality tools to manage their product-specific data, as well.  What else do you see vendors offering besides customer and product related data quality techniques?