Data-Management

Data is the lifeblood of modern businesses. It drives everything from strategic decisions to day-to-day operations. However, many organizations are plagued by poor data quality that dramatically impacts their bottom line. In this post, we'll explore the striking costs of bad data and steps you can take to clean it up.

The Shocking Impacts of Bad Data

Poor data quality comes in many forms - inaccurate, incomplete, outdated, duplicated, improperly formatted, and more. While it may seem harmless on the surface, low-quality data has far-reaching detrimental effects. Here are some of the most stunning impacts:

  • Revenue losses - Bad data leads to lost sales opportunities, incorrect pricing, improper forecasts, and compliance issues. According to an IBM study, poor data costs organizations an average of $15 million per year in lost revenue.
  • Increased costs - Errors and inconsistencies force employees to waste time tracking down accurate information. This lost productivity drains budgets quickly. Research from Experian found that most companies estimate their poor data costs them over $20,000 per year.
  • Ineffectual decision making - With inaccurate or incomplete data, leaders make ill-advised strategic choices that move the company in the wrong direction. MIT Sloan Management Review found that bad data contributes to poor decisions that cost the US $3 trillion per year.
  • Dissatisfied customers - Nothing is more frustrating for customers than inconsistent or incorrect experiences caused by bad data. It erodes trust in the brand and leads customers to switch to competitors. An Accenture study found that poor data quality causes around 40% of all business dissatisfaction.
  • Regulatory non-compliance - Financial services, healthcare, and other heavily regulated industries can face major fines, lawsuits, and reputational damage from acting on inaccurate data. Maintaining compliant, high-quality data is critical.

The impacts above are just the tip of the iceberg. Bad data causes organizations to "fly blind", unable to harness insights to operate efficiently and profitably.

Shocking Statistics on the Pervasiveness of Bad Data

Given the heavy costs, you would expect companies to put lots of effort into maintaining pristine data. Unfortunately, statistics show poor data quality is astonishingly common:

  • A staggering 70% of collected data goes unused by businesses according to IBM.
  • Only 3% of companies' data meets basic quality standards according to Forbes.
  • 25% of critical data contains errors according to Gartner.
  • Companies waste around 30% of their time hunting down and verifying data per Forbes.
  • Incorrect or outdated data costs enterprises an average of over $14 million per year according to Dataversity.

These shocking statistics illustrate how practically every organization suffers from poor data. Bad data spreads insidiously as outdated sources combine with new inaccurate inputs. Companies accept it as an inevitable drag instead of dedicating resources to fix it.

Common Causes of Bad Data

Understanding what causes data problems is crucial for preventing and remediating them. Here are five of the most common sources of bad data:

1. Manual Data Entry

Despite advances in automation, many business processes still rely on humans manually entering data. Typos, misinterpretation of information, and oversight lead to errors that propagate through systems.

2. Multiple Siloed Sources

Important information for companies often resides in spreadsheets, databases, documents, emails, and more. Each source can have discrepancies that need reconciliation. Outdated legacy systems represent another typical issue.

3. Lack of Input Validation

Applications should validate data formats, ranges, completeness, and integrity during input. Too often this validation is overlooked, allowing bad data to be ingested.

4. Integration Failures

When linking applications via APIs and integrations, mismanaged data mapping and transformation frequently corrupts information.

5. Uncontrolled Changes

Updates to applications and data structures, if not planned and tested carefully, commonly introduce data errors that are hard to detect.

Proactively monitoring these root causes is step one toward better data hygiene.

Quantifying Bad Data Costs for Your Organization

By now it is clear bad data exacts a heavy toll on businesses. But what does it cost your organization specifically? Quantifying the impacts can help secure executive support for data quality initiatives. Here are three methods to estimate your costs:

1. Top-Down Statistical Approach

Start by determining your company's revenue. Then multiply by the average percentage costs found in data quality studies:

  • Lost revenue: Average of 2% of revenue (IBM)
  • Increased costs: Around 15% of revenue (Experian)

For example, for a company with $100 million in revenue:

  • Lost revenue: 2% x $100 million = $2 million
  • Increased costs: 15% x $100 million = $15 million
  • Total = $17 million per year

This method provides a quick data quality cost estimate for minimal effort.

2. Internal Business Impact Assessment

A more thorough approach is to work with business leaders to estimate costs within each department. Analyze processes that depend on data and evaluate productivity lost, errors generated, and opportunities missed due to data issues.

While more labor intensive, this method gives greater visibility into where poor data hits hardest inside your organization. It builds internal support for fix initiatives.

3. Data Quality Analysis

For greatest precision, directly analyze and measure data quality across your systems. Techniques include:

  • Data profiling - scan data to quantify completeness, validity, accuracy, etc.
  • Data auditing - manually review samples of data for problems.
  • Business impact modeling - simulate effects of different data error levels on business processes.

This data-driven approach provides detailed insight into the magnitude, origins, and impacts of poor data. However, it requires data quality skills and tools to implement.

Combining the approaches above can rapidly provide a solid case for tackling data quality at your organization. Having quantified bad data costs also aids prioritizing data remediation efforts for maximum benefit.

4 Steps to Improve Data Quality

Armed with knowledge of the immense cost of low quality data, let's explore key steps to clean it up.

1. Assess Data Health

As discussed above, auditing and profiling data provides crucial visibility into where problems exist. Define Key Performance Indicators (KPIs) such as % of records with inaccuracies, % of fields with invalid values, and % of stale data. Leverage technology solutions to automate analysis at scale.

2. Address Root Causes

Keep investigating beyond just quantifying issues to understand what systemic root causes generate the problems. Look for error-prone processes, unvalidated data inputs, and lack of integrity checks. Spotting root causes enables addressing them at the source for sustainable solutions.

3. Standardize & Centralize

Much bad data arises from conflicting formats and definitions across systems. Define enterprise-wide standards for nomenclature, metadata, reference data, data types, schemas, and more. Moving data to a central repository removes siloed discrepancies. Master data management (MDM) solutions can assist greatly here.

4. Embed Data Quality Into Processes

Make quality practices such as input validation, profiling, auditing, etc. core parts of ongoing business processes vs afterthoughts. For example, add data testing scenarios into application requirements. Training people to work cross-functionally also improves accountability.

Achieving pristine data requires making it a priority on par with other business imperatives. But given staggering data quality costs, the effort pays for itself many times over.

Start Cleansing Your Data Today

This post has illustrated eye-opening statistics on how uncontrolled data issues sap company performance and finances. Yet by following proven best practices, you can eliminate these waste and disruption. Regaining control of your data amplifies efficiency, decision making, and innovation velocities for game-changing business upside.

We hope these insights provide motivation and direction to initiate your data quality journey. ThoughtSpot's team of data experts is ready to help assess your situation and recommend solutions tailored to your needs. Contact us today to get started unleashing the full value of your data.

What are the main types of bad data that organizations suffer from?

There are many flavors of bad data that hinder organizations. Some of the most common include:

  • Inaccurate data - Information that is simply incorrect. This can occur due to human error in data entry, lack of validation, broken integrations, and more. Even small inaccuracies degrade trust in data.
  • Incomplete data - Missing information critical for business processes. Often certain applications or databases have incomplete views of customers, products, or other key entities. However, critical decisions require a holistic perspective.
  • Outdated data - Once valid data that has since gone stale. Customer addresses, product prices, equipment specifications, and more can easily become outdated quickly. But old data leads to incorrect outputs.
  • Duplicate data - The same entities or records stored redundantly across multiple systems. This leads to conflicts and inconsistencies. Duplicates waste storage and computing resources also.
  • Poorly formatted data - Information stored without consistency or validation. For example, dates entered in different formats, free-form text instead of coded options, etc. Poor formatting makes aggregation and analysis exponentially more difficult.
  • Non-compliant data - Information that does not adhere to regulatory, legal, or policy requirements related to storage, access, retention, and use. Non-compliant data exposes organizations to massive risk and penalties.

What are the root causes of bad data in most organizations?

The origins of bad data are often entrenched in standard practices and assumptions:

  • Reliance on manual processes and data entry propagating human errors
  • Siloed systems with no synchronization causing duplicates and inconsistencies
  • Lack of input validation allowing incorrectly formatted or nonsensical data to enter systems
  • Poorly managed integrations where interfaces manipulate data in ways that damage its integrity
  • Uncontrolled changes to schemas, code, or components that impact data without sufficient testing
  • Insufficient training on the importance of data discipline for end users and developers
  • Absence of monitoring via profiling, auditing, and quality KPIs to detect issues early

Proactively identifying and addressing these root causes is key to sustaining high data quality.

How does bad data specifically impact customer-facing operations like sales and marketing?

Bad data wreaks havoc on customer-facing functions within organizations:

  • Inaccurate or duplicate customer data leads to frustrating, inconsistent experiences that erode satisfaction and loyalty.
  • Stale, incomplete product data means marketing collateral, websites, and sales collateral are unreliable resulting in lost credibility, sales, and leads.
  • Targeting errors from faulty data cause marketing campaigns and sales efforts to waste resources poorly matching messages to audience interests.
  • Pricing errors result in missed revenue opportunities, or even violation of contractual agreements damaging the organization's reputation.
  • Forecasting miscalculations based on flawed data leave organizations with idle capacity or insufficient resources to meet real customer demand.

The common theme is that bad data undermines organizations' ability to understand customers and market to them effectively. This manifests directly on the top line of the income statement.

What techniques can quantify the business impact of bad data?

Organizations can leverage various techniques to quantify bad data's business impacts:

  • Statistical sampling of major databases and systems to profile and extrapolate overall data health
  • Root cause analysis on known data issues and defects to trace their origination and propagation
  • Process analysis of data-dependent initiatives to estimate productivity gaps, waste, and quality slips
  • Business impact analysis using computational modeling to simulate the cascading effects of bad data on operations
  • Anomaly assessment examining performance deviations, alerts, and hot spots to identify where data quality may be a factor
  • Benchmarking against industry research on typical ratios of IT budgets, revenues, and customer satisfaction affected by data problems

A combination of data-driven and experiential approaches provides a comprehensive quantitative view of how data quality pain points map to financial impacts.

What are some quick first steps companies can take to start improving data quality?

Here are some high impact first steps on data quality:

  • Assess user pain points via surveys, interviews, and mining support tickets to spot data problem hot spots.
  • Profile sample datasets with basic rules to estimate overall quality and complexity of cleansing.
  • Run Hadoop-based data discovery tools to automatically surface outlier, inconsistent, or duplicated data.
  • Convene focus groups of business and IT leaders to capture pet peeves and ideas for improvement.
  • Identify 3-5 Master Data types like customer, product, account to standardize at an enterprise level.
  • Develop data quality KPIs for dimensions like accuracy, completeness, and timeliness and establish goals.
  • Add data validation to applications during new feature development or enhancements.

Rapid small wins demonstrate data quality's importance and lay the foundation for more expansive efforts.

How should organizations budget for data quality improvement initiatives?

A combination of top-down benchmarking and bottom-up analysis ensures sufficient funding:

  • Top-down, allocate 5% of IT budget as a rule of thumb based on industry analyst recommendations
  • Bottom-up, aggregate estimates from business units on costs of bad data and benefits of improvements
  • Blend with internal and external benchmarking on data quality spending for companies of similar size and industry
  • Factor in both hard costs (tools, labor, services) and soft costs (employees time investment, satisfaction)
  • Socialize proposed budget with executive sponsors and anchors commitment from them

Budgeting should rise above typical IT cost efficiency frame of mind to focus on bad data's enterprise-wide impacts and ROI.

What are some leading technologies and tools to improve data quality?

Key categories of solutions include:

  • Data profiling to automatically scan, analyze, and report on data completeness, correctness, and governance
  • Data mastering to reconcile, persist, and distribute authoritative master data across the organization
  • Metadata management to catalog data contexts, definitions, origins, interdependencies, and uses
  • Data parsing/standardization to normalize disparate formats and representations into consistent structured data
  • Data validation to apply rules and constraints to detect irregularities in both batch and real-time data as it moves through integration pipelines
  • Data governance to manage policies, guidelines, issue tracking, and workflow around data quality

Leveraging solutions tailored for each organization's needs and challenges is critical to scale improvements.

Should companies invest in dedicated data stewards or quality organization functions? Why or why not?

Dedicated internal data quality roles and teams are usually pivotal for larger or highly data-driven organizations. The reasons are:

  • Data quality spans multiple systems, technologies, and business units requiring coordination.
  • Issues fall through the cracks without clear ownership and accountability.
  • Continuous proactive oversight is needed beyond one-off projects.
  • Impartial internal experts can credibly mediate conflicts of interest in data policies.
  • Formalized data quality workflow, reviews, and checks need consistent management.

For smaller companies, embedding responsibilities into existing IT and analytics roles can be sufficient. But all organizations need someone to champion data quality as a priority.

How can organizations change mindsets and culture to value data quality intrinsically?

Culture shifts require continuous reinforcement:

  • Evangelize via all hands meetings, lunch & learns, guest speakers on data quality
  • Incentivize data quality KPIs as part of performance management and compensation
  • Celebrate data heroes and publicize data quality success stories
  • Add data quality into individual development plans and career ladder criteria
  • Gamify data quality monitoring and remediation with contests and leaderboards
  • Educate all employees on the business cost and impact of data quality through training

With sustained messaging, the goal is to make every employee feel personally accountable for proactively contributing to data quality.

What core message should organizations take away about addressing bad data?

The key takeaway is that bad data directly destroys value and performance for modern data-driven organizations. Data quality can no longer be an afterthought, delegated just to IT and analytics teams. It requires executive prioritization, cross-functional coordination, and modern tools to systematically improve. But the business upside from unlocking quality, trustworthy data is immense. Organizations must take responsibility today for the data they will rely on tomorrow.

Rasheed Rabata

Is a solution and ROI-driven CTO, consultant, and system integrator with experience in deploying data integrations, Data Hubs, Master Data Management, Data Quality, and Data Warehousing solutions. He has a passion for solving complex data problems. His career experience showcases his drive to deliver software and timely solutions for business needs.