The Data Quality Dilemma for Business-to-Business Supply Chains

Low-quality data prevents companies from achieving enterprise-wide insight into their global supply chains

Jun 4, 2015

Dean Hahn-Carlson

In discussing how to improve supply chain performance, poor data quality is often the elephant in the room. Low-quality data prevents companies from achieving enterprise-wide insight into their global supply chains. Yet the problem seldom gets the amount of attention its size suggests it deserves.

Each party in a physical supply chain needs accurate data to perform its operations. Without good data, no party can function efficiently or deliver on time. No party intends to inject low-quality data into the information supply chain. So why does so much low-quality data flow out of it?

There are two main causes:

Fidelity loss during execution. Each party in the supply chain takes proper care of the information it needs to perform its duties. But all parties tend to pay less attention to data they don't need for their own use. Their databases may allow no room for such non-essential data. Or their systems may stuff it into an unstructured space reserved for other information from the customer.
Number of execution systems. Many internal and external execution systems feed data into the information supply chain. Big companies with global supply chains and outsourced logistics are likely to receive data from thousands of sources. Each source uses its own terminology to describe the goods or services it provides. Each has its own identifiers for the other parties in a transaction.

Internal and external execution systems are data silos. They capture only the narrow range of high-quality data they need to perform their limited functions.

The Jigsaw Puzzle

For companies to extract more value from their information supply chain data, they must assemble a complex jigsaw puzzle. But this puzzle is made up of pieces that weren't designed to fit together.

The desired data is scattered across various computer systems, each producing many document types in many formats. Although the data from each source may be perfect for that source’s intended use (although experience suggests it probably isn't), unless you can make the pieces fit together, they don’t offer a clear picture.

A Simple Example: Incompatible Data

Let’s say you want to know how well your supply chain performs for your best product and your best customer. That’s a reasonable question since the entire purpose of your supply chain is to do just that—get your products to your customers on time and in good condition.

So you frame the question: “What percent of the total sales value of Product Z do we deliver to ACME Company on time?”

Let’s assume your company has only two order-capture systems. Each system serves a single warehouse. Let’s also assume you ship directly to ACME, using only a handful of carriers. Finally, let’s assume your company uses superior discipline in maintaining its customer master records. Each order-capture system contains only one record for ACME Company in its customer master.

But there’s a problem. Customer Master 1 assigned customer ID 1234 to ACME. The person who created the record entered the customer’s name as ACME. Customer Master 2 assigned a different customer ID, 47398, to ACME. And the person who created this record used the full legal name ACME Industries. How can you know that customer ID 1234 is the same as customer ID 47398?

Let’s set that problem aside for a moment and plow on. How many orders did ACME place for Product Z in the past year? You can easily find that information, right? From order entry, you can get dates, quantities, prices and requested delivery dates. Once you figure out that ACME and ACME Industries are the same entity, you’re home free. Right?

Not so fast. You don’t want to know when ACME ordered Product Z or when they wanted it delivered. You want to know how often Product Z arrived on or before the date ACME wanted it.

Inspiration strikes. Carrier invoices can show when ACME received your shipments containing Product Z. You may get a little dusty rummaging around for paid invoices, but you’re hopeful.

Wait a minute. Which carrier handled which shipments? You have to dig into another set of records. And so it goes.

Another Simple Example: Inaccurate Data

Let’s say you receive invoices from a supply chain partner who incorrectly reports your shipment was delivered to the U.S. state of TC. The data in the state/province field is both consistent and complete (exactly two characters). But it's not valid or accurate. (The valid value is TX.) Throw that into your data warehouse and watch the fireworks begin.

We all experience similar situations. Having the data in electronic form doesn’t help much unless you also resolve the issues of data quality.

For data to be usable, it must be accurate, consistent, complete and valid. If a data element doesn't meet any one of these quality criteria, it doesn’t provide usable information.

Are these problems impossible to solve? No.

Solving the Data Quality Problem with Data Refinement

Developing high-quality, consistent data from diverse sources requires rigorous processes to standardize, normalize, correlate, fix and check each data element. That rigorous process is called data refining.

A data-refining process receives raw data from many disparate sources across the supply chain. It converts the raw data to a common, standard structure. It then uses correlation and other big data techniques to correct and enhance the data.

The data-refining process must accommodate constant change. New data values begin to appear in your information supply chain and your systems must determine whether they're valid. When suppliers update or modify their systems, they may start sending you different data values.

To ensure your data stays clean despite such inevitable changes, you need automated systems that check transactions as they occur. You also need manual processes to resolve or accommodate changes.

With continuous vigilance, you can be certain you identify and correct bad data at its sources. You can stop it before it corrupts your downstream processes. You can maintain the integrity of your audits, expense posting and supply chain performance analysis.

Trusted Decisions from Trusted Data

In principle, the torrents of data companies receive today empower them to form a clear picture of their global supply chain performance. In practice, it’s hard to assemble a jigsaw puzzle with pieces that don't fit together. Too much of the data companies receive is disjointed, inconsistent and inaccurate.

Companies needn't settle for this frustration. Data-refining processes can enable them to convert those torrents of raw data into standardized, normalized, and enhanced logistics and supply chain data. With higher quality data, global companies can make better decisions to optimize their supply chain performance.