The 6 Common Ways Dirty Data is Created

By Kristian Kalsing on May 19, 2014

dirty data

Photo by Duncan Hull [CC Attribution 2.0]

Dirty data is costing companies millions of dollars each year. Errors and omissions in master data in particular are notorious for causing costly business interruptions. It’s helpful to understand the different types of dirty data that are commonly creeping their way into enterprise systems when considering ways to improve your data quality.

Here are my six most common types of dirty data:

  • Incomplete data: This is the most common occurrence of dirty data. Important fields on master data records, useful to the business, are often left blank. For example, if you haven’t classified your customers by industry, you cannot segment your sales and marketing initiatives by industry.
  • Duplicate data: Another very  common culprit is duplicate data.  Most companies deal with issues with duplicate customer records, but duplicate materials are also very common. This can be costly to companies due to excess in inventory and sub-optimal procurement decisions.
  • Incorrect data: Incorrect data can occur when field values are created outside of the valid range of values. For example, the value in a month field should range from 1 to 12 or a street address should be a real address.
  • Inaccurate data: It is possible or data to be technically correct but inaccurate given the business context. Costly business interruptions are often rooted in inaccurate data. For example, minor errors in customer addresses can result in deliveries at the wrong locations even though the addresses are actual addresses.
  • Business rule violations: There are often large collections of poorly documented business rules associated with master data that are specific to the industry or business context. For example, beverage products should have a Unit of Measure in ‘fl. oz.’ or payment terms for a certain type of customers should always be ‘Net 30.’
  • Inconsistent data: Data redundancy–i.e., the same field values stored in different places-often leads to inconsistencies. For example, most companies have customer information in multiple systems and the data is often not kept in sync.

Data FieldWinshuttle for Master Data is attacking dirty data head on by providing a platform of capabilities for improving data management. All of the six types of dirty data above can be addressed by implementing active data governance solutions that enforce validation at the point of entry. Click here to watch an example of a Customer Master solution with different types of validation.


Questions or comments about this article?

Tweet @kalsing to continue the conversation!


About the author

Based in Seattle, Kristian is part of the product management team at Winshuttle where he is responsible for solutions that help companies to perform better by improving their management and governance of master data. Kristian has 15 years of experience with enterprise solutions in a broad range of industries across Europe, Australia and North America. When not at work, Kristian spends most of his time climbing in the mountains of the Pacific Northwest and elsewhere.


Related posts


Did you enjoy this article?

Please share it with others and on your social media channels.