Data Quality, two words that make your eyes gloss over. No one wants to think about it, after all, data is invisible. It is the bytes and bits of the information we use to do business, it is accessible through SQL queries or reports, it is manipulated to send to vendors and stored for future use. Most people don’t think about data or data quality too often or too hard. Instead we go about our daily lives. We take orders and deliver goods and services to customers. We bill them and collect revenue. We improve existing goods and services and market and sell them. We manage the organization. We try to figure out e-business. We make decisions and we plan. We report on how we are doing and try to do better. We try to gain a competitive edge. Data underpins everything we do, but we still don’t think about it. Besides who wants to worry about data quality if no one is complaining? We have real work to do, customers to satisfy, production schedules to meet, decisions to make, strategies to map out and a demanding board to answer to.
In our rush to deliver, we forgot that customers recognize poor quality more readily. They are sensitive to conversions, billing errors, improperly addressed mail and claims that turn out to be incorrect. In many cases, users get fed up, which leads to money being spent elsewhere. How many times have you got frustrated at improperly addressed mail because a company got your name wrong?
In today’s day and age, the world focuses on carbon footprints and cleaning up our environment. This leads to a great analogy. A data store whether it is a SQL database, an Access database or an Excel spreadsheet can be considered a lake. The lake water represents the data, and the streams represent the flow of information out of the lake. Factories that exist upstream introduce new sources of pollution or in this case the input of poor quality data into the lake which eventually flow down stream and contaminating the streams creating poor conditions for matching logic. So how do we clean up our lake?
By cleaning up existing conditions and preventing future contaminants! We spend small amounts of time each week picking up the garbage on the shore. However, correcting existing data by itself will not increase the downstream quality, as the factories will continue to introduce more polluted water into the lake creating an endless cycle. The root causes of the bad data need to be identified and eliminated. This shifts the focus from detecting and correcting data errors to preventing future errors from being introduced.
No comments:
Post a Comment