Blog: Bad data – and how to avoid it

Bad data costs companies millions of dollars every year, deteriorates their reputation and causes them to miss opportunities. But what is bad data, and how do you avoid it?



1. Bad Data

1.1 What is it?

Data quality is a measure of the condition of data, normally assessed in five different criteria; validity, accuracy, completeness, consistency and uniformity. Bad data is defined as an inaccurate set of information, including missing data, wrong information, inappropriate data, non-conforming data, duplicate data and poor entries (misspells, typos, variations in spellings, format etc). Meaning, when data fails to meet the quality demands listed in the criteria of assessment, it is deemed “bad”.

Reports have shown that companies experience that between 20 and 30 percent of their data is inaccurate, which is by extension causing negative effects to business performance.


1.2 What causes it?

There are mainly three groups of processes responsible for causing bad data or data disruption:

  • Processes that bring data from outside the database
  • Processes that use data that are handled inside the database
  • Data that already exists in the database but becomes obsolete mainly due to time and not due to changes.

Basically, all the challenges of data quality arise from the way that data has been entered into the system, how data is being used, updated and maintained whilst in the system and how the data is decommissioned when it is not relevant anymore.


1.3 The consequences

As mentioned previously, research shows that bad data has a strong negative impact on business performance, with costs being the main factor. Gartner, for example, suggests that organizations lose between $10 to $14 Million USD annually due to poor data. Furthermore, CIO conducted a survey, revealing that almost 80% of U.S. companies believe that they’ve lost revenue due to data challenges.

Costs deriving from bad data quality can, on a high level, be broken down into three main categories:

  • The actual costs that are caused by low-quality data. Examples of costs that can occur through poor data quality, include the likes of shipping products to the wrong address, marketing to the wrong email or postal address, costs related to reputational damage, lost revenue or missed opportunities.
  • The cost of assessments and inspections conducted to verify if the processes in question are performing correctly and if the incorrect outcome is a result of bad quality data.
  • The cost that results from activities where the desired outcome is the improvement of existing data quality. As an example, Gartner found that data scientists spend roughly 80% of their time cleaning and organizing data, time spent that generates costs in staff and management.

Data is like fuel; If you put bad-quality fuel into your vehicle, or fail to maintain the right levels of fuel, the vehicle will quickly become unreliable and stop functioning, inevitably breaking down. Therefore, the purity and quality of data in any business is crucial.

bad data

2. Data cleansing explained

Data cleansing refers to the process of replacing, correcting, modifying or deleting corrupt, inaccurate, irrelevant, duplicated, incomplete or incoherent records from a set of data. Data cleansing can be performed in different ways, but normally follows a similar process to the one described below. 


2.1 Audit

In some cases, the process starts with a data quality audit, providing a clearer picture of ​​the extent of any deficiencies. This normally applies when, for example, many users have access to a CRM and have permissions to alter data within the system, jeopardizing data quality. In these cases, you might want to see how big the scope of your cleansing project truly is, before getting going cleaning it up.


2.2 Cleanse

The way cleanses are conducted has changed rapidly in recent years. What previously used to be excel files sent to a data service provider, which was updated and sent back to the customer, is now commonly performed via API solutions. The old way of doing things was usually an expensive way, as a lot of manual work was involved on both ends.

API’s make it possible to perform multiple calls to endpoints, returning up-to-date information about your customers from quality sources. This enables a smoother process of cleansing data, as well as having correct data entered directly into your databases, systems and tools, enabling a smoother data cross-system cohesion process. 


2.3 Monitoring setup

Depending on the industry and line of business your company is in, regular cleanses are normally conducted with a 1-12 month interval, to make sure your customer data is up to date and accurate. 

However, modern technical solutions such as webhooks enable monitoring data with push notifications being sent as data changes occur. Meaning, you can merge correct and up-to-date data instantly into your system as soon as your customer data changes in any way. This way of monitoring data rapidly reduces administration and assures data quality over time like never before. 

Once a cleanse has completed, you want to set up a monitoring process, to make sure you don't miss out on changes that occur regularly in your customer data.

"Webhook solution example for customer data monitoring."

3. How to ensure data quality

Setting up a framework for how data is managed from a lifecycle perspective is a good place to start, if you want to ensure data quality over time. You can, in a simplified way, say that there are three steps that customer data goes through in companies, which includes data entry, data usage and data deletion. We dive a bit deeper into all three below.


3.1 Data entry

Start by reviewing your data needs, using a bare minimum approach. Some questions to ask yourself in that process could be:

  • What data do we need to ensure regulatory compliance?
  • What data do we need from a risk- and analysis perspective?
  • What data do we need from a sales and marketing perspective?
  • What data is not business critical? In other words, what data can we do without?

The desired outcome of data entry is to have every source of data – from both internal and external sources – collected, validated, merged, enriched and linked into one complete and accurate view (cohesion). Cross-system consistency, when there is no conflicting information across all sources of data within the business, is key when it comes to using data successfully in your company.

Collecting data from external sources, instead of collecting it from the user or customer, is usually the best way to ensure data quality as data enters your systems. An address verified by a governmental body or a high-quality source, reduces risk of faulty data entries such as misspellings and other mistakes that the human error factor might cause. In a digital customer onboarding process, automated data collection and pre-population also helps improve customer experience, as the user can get through your flow quicker without churn.


3.2 Data usage, storage and maintenance

Continuously updating your data, making sure it is accurate, is not only useful and handy, but in many cases it’s downright necessary. Communication sent to the wrong address is costly, being exposed to fraud is expensive and doing business that leads to involvement in money laundering activities is illegal.

During the time that your customer is active, a lot of data can change. A person moves, gets married or changes his name, while a company may change offices, replace board members or bring in a new owner. All changes that occur can affect the customer's status with the organization in different ways. Some changes create opportunities for additional sales, while others are so fundamental that they simply enable continued communication.

However, some changes may mean that a continued relationship with the customer entails an increased risk for your business, or even becomes illegal. Customer data thus needs to be monitored so that it is possible to quickly and smoothly update relevant changes directly into the customer database, which then drives events in the organization. Changes that silently update the customer database add very little to the organization and do not maintain compliance requirements. Ensure that valuable and important information is noticed and used.

Webhook solutions are a great way of ensuring that you receive all relevant changes in your customer data. They simply push notifications your way as changes occur, enabling you to collect the new data and update your systems directly.


3.3 Data decommissioning and deletion

The legal grounds for collection of data should rule on how long your customer data is kept. When it’s no longer possible to claim a purpose for the data to be stored, it must be discarded and deleted. However, data can be anonymized and used for analytical purposes after the initial purpose of usage has changed, for example when a customer is no longer a customer.

First, make sure you structure and build a framework for compliance (GDPR, Money Laundering Act etc.). The framework should be available internally as well as externally, and provide details on what customer data you collect, how it's stored and used and how and when you delete it. If there are time stamps on all your data, it is easy to assess how long you've had it in your possession, and therefore create rules on when to automatically delete it.

Would you like to know more?