The 6 Common Ways Dirty Data is Created

By Kristian Kalsing on May 19, 2014

dirty data

Photo by Duncan Hull [CC Attribution 2.0]

Dirty data is costing companies millions of dollars each year. Errors and omissions in master data in particular are notorious for causing costly business interruptions. It’s helpful to understand the different types of dirty data that are commonly creeping their way into enterprise systems when considering ways to improve your data quality.

Here are my six most common types of dirty data:

  • Incomplete data: This is the most common occurrence of dirty data. Important fields on master data records, useful to the business, are often left blank. For example, if you haven’t classified your customers by industry, you cannot segment your sales and marketing initiatives by industry.
  • Duplicate data: Another very  common culprit is duplicate data.  Most companies deal with issues with duplicate customer records, but duplicate materials are also very common. This can be costly to companies due to excess in inventory and sub-optimal procurement decisions.
  • Incorrect data: Incorrect data can occur when field values are created outside of the valid range of values. For example, the value in a month field should range from 1 to 12 or a street address should be a real address.
  • Inaccurate data: It is possible or data to be technically correct but inaccurate given the business context. Costly business interruptions are often rooted in inaccurate data. For example, minor errors in customer addresses can result in deliveries at the wrong locations even though the addresses are actual addresses.
  • Business rule violations: There are often large collections of poorly documented business rules associated with master data that are specific to the industry or business context. For example, beverage products should have a Unit of Measure in ‘fl. oz.’ or payment terms for a certain type of customers should always be ‘Net 30.’
  • Inconsistent data: Data redundancy–i.e., the same field values stored in different places-often leads to inconsistencies. For example, most companies have customer information in multiple systems and the data is often not kept in sync.

Data FieldWinshuttle for Master Data is attacking dirty data head on by providing a platform of capabilities for improving data management. All of the six types of dirty data above can be addressed by implementing active data governance solutions that enforce validation at the point of entry. Click here to watch an example of a Customer Master solution with different types of validation.

About the author

Kristian Kalsing

As Vice President of Product & Solutions, Kristian is responsible for product management, enterprise solutions, and product marketing. He is instrumental in driving the strategic direction of the company and continuously elevating the value that Winshuttle’s software platform and methodology bring to customers. Prior to joining Winshuttle in 2010, Kristian was widely respected as one of the pioneering thought leaders in bridging the gap between SAP and Microsoft technologies. Since starting his career in Denmark, Kristian has gained experience with enterprise software solutions in a broad range of industries in Europe, Australia, and North America. He has held various roles across Engineering, Professional Services, Sales, and Marketing.

Questions or comments about this article?

Tweet @kalsing to continue the conversation!