Bad Data – The Problem with Procurement

Making sound procurement decisions depends on accurate, reliable data. Here's how to start addressing the product data quality problem

By Daniel Teachey

With the precarious economic situation, companies across the board are trying to slash costs wherever they can. Because organizations can spend as much as 60 percent of their revenue to acquire the goods and services necessary to conduct business, procurement professionals are being asked to reduce the organization's overall spend, some by as much as 20 percent in a year. Companies now realize more than ever the effect of procurement strategies on their profitability and viability.

Since all organizations have data on their products, inventory, parts and services – and most organizations have more product data than they have customer data – this information is becoming increasingly important to the overall health of a business. However, poor-quality product data have been causing problems as long as enterprises have been collecting this kind of information. The unique challenges in the management of product data can inhibit the search for methods to optimize the supply chain, improve spend management and create a more unified view of the enterprise.

The problems with product data stem from the actual structure and conventions of this type of information. Unlike customer data, which have a relatively small set of defined and universal attributes (name, address, e-mail address, phone number), product data are much more complex. For example, the definitions and descriptions of a 60-watt light bulb may be completely different within the same company. (See Table 1.)

Table 1 - Light bulb classification confusion


Commodity classification provides a “common language” for product data


Product Description

Light bulb

60-watt, frosted

Light bulb

60-watt, brass base, 10,000 life hours

Light bulb

Incandescent, A19 bulb shape, 60 watts

If something as simple as a light bulb can cause procurement confusion, just imagine how inconsistent or unreliable product data can be if the information arrives from dozens of different suppliers in your trading network. Each supplier could have a different item number, and the description fields may contain contradictory or non-standard information.

Organizations are also grappling with the fact that enterprise resource planning (ERP), supply chain management (SCM) and other applications have done little to actually help address these issues. These applications encapsulate the processes that drive a business every day, yet they typically have no integrated data quality capabilities to find and eliminate bad data. Furthermore, creating additional ERP or SCM applications on top of existing applications – which essentially develops redundant silos of product information – only exacerbates an already complex task.

Issues such as duplicate product numbers, obsolete product IDs and inconsistent item descriptions exist across the organization, affecting every level of the operation. An inability to understand the products that are being sold can dramatically hinder the organization's ability to plan for new products in the future. Similarly, a confused, disparate view of direct and indirect spending can foil the most well-intentioned spend management efforts.

The bottom line is that poor-quality product data create difficulties in controlling the costs of production, reduce the productivity of the company and affect the delivery of finished goods. After all, the data within your applications drive every decision you make, from long-range strategic planning to day-to-day operations.

Addressing the Product Data Quality Problem

Despite the many difficulties associated with product data management, many progressive organizations are taking steps to directly address these data quality issues. Given the complex nature of product data – and the lack of standards across, and even within, organizations – companies often look to existing data quality technology solutions to standardize, validate and verify the integrity of this information.
Data quality technology historically grew out of the customer data realm. In fact, data quality software started as a way to cleanse and de-duplicate marketing and customer relations data. A data quality technology can accomplish the critical phases of the data management process. A standard process would include:

  • Data analysis: Use data profiling or data discovery to uncover strengths and weaknesses in the data.
  • Data improvement: Start to address the known problems in the data through automated standardization, verification, matching, clustering and enrichment practices.
  • Data controls: Since new data are always arriving in an organization, apply some monitoring techniques to find and flag bad, suspicious or non-compliant information.

For example, if there were three distinct records for a Joe Smith, a Joseph Smith and a Joe Smythe, but all of them mapped to the same exact address, telephone number and e-mail identity, data quality tools would be able to reconcile these three entries into a single record. Conversely, if there were five records, all for Joe Smith, all with identical information, the de-duplication process would eliminate those extra records.

As data quality technology became more sophisticated, IT and business managers began to use this technology for product, item, inventory and other non-customer data sets. In the product data realm, a data discovery effort can quickly determine if there are potential duplicates in the data set – or if data lacks standards across systems. For example, it would discover the same light bulb is listed three times in the purchasing system, based on product attributes such as manufacturer, part number and cost. However, the most vexing problem for product data quality programs is the second phase – improving enterprise data with a standard method for organizing, classifying and managing product data.

Recently, companies have embraced industry-standard commodity coding systems like the United Nations Standards Products and Services Codes (UNSPSC), eCl@ss and GS1 to provide a vendor-neutral, objective way of classifying data. Custom classification codes are also being developed internally by many organizations. A standard code, when applied to a product or inventory item, can be used as a way to reference and sort these data across any application.

For example, within UNSPSC, the code 26121520 has both the same commodity description and meaning for every organization (in this case "copper steel wire"). With this code appended to the record, every organization supporting this code can more effectively compare prices between various copper steel wire suppliers. Or a company can reconcile every product data entry within their applications that has the 26121520 code and begin to see how much the company is spending on that type of product.

These standards are a way of acknowledging that product data can – and will – have unique representations within the systems. However, by providing a single, universal method for classifying that information, the data quality problems inherent in product data will not cause significant problems within business processes.

Using Commodity Classification to Improve Spend Analysis

Commodity classification lets organizations group related items at a detailed level, validating comparisons between items within the group. One of the primary hierarchies used in every spend analysis implementation is a commodity classification structure. This structure allows a company to analyze its expenditures based on commodity type, then drill into that type to look at specific product groups.

Table 2 provides an example hierarchy of a UNSPSC code for personal digital assistants (PDAs).

Table 2 - Sample commodity coding structure


Commodity classification provides a “common language” for product data


Product Classification


Communications and Computer Equipment and Peripherals


Hardware and Accessories




Personal Digital Assistants (PDAs)

With the 43171804 code attached to any PDA in the company's data sources, the organization can more readily understand how much is spent on PDAs. From here, a tool for spend analysis will give them the ability to determine which types of PDAs they are buying, from whom they are buying them, and how much they are spending with each vendor or for each type of PDA. However, if a company's groups were limited to a more generic category like "Computer Hardware," all computer hardware would be part of the same group. Everything from an inkjet printer to a high-end server might fall into the same category.

Today, most commodity classification is done as a service, usually through an off-site services engagement. Corporate information is sent in bulk to these services on a regular basis to have an industry-standard code attached to product records. This approach has several risks:

  • Accuracy: The staff at these service firms often does not know the client's business and may not know the correct code for an item.
  • Inconsistency: Two people may code the same item in different ways, or an individual can inadvertently apply different codes to the same item.
  • Expense: These engagements are typically priced on a per-record basis, creating a costly recurring expense.
  • Timeliness: Coding services usually take weeks to complete, which means the organization continues to use questionable data while the data are being analyzed offsite.

By contrast, automated coding by a data quality system can be less risky. Rules are built into the system by product specialists or procurement professionals – the people who know the parts and the business. The resulting output from data quality technology is consistent and offers higher degrees of accuracy. It is easy to modify the rules and make the system more intelligent. The process can be run at any time, on any data and as often as the company needs to run it. Answers are available in minutes, not weeks. And since companies can process these files internally, however many times they need, the per-record coding expense is eliminated.

In Summary

The current business climate is putting pressure on every facet of the organization. For procurement managers, the pressure is only going to increase as companies look for ways to increase efficiencies and save money. The procurement organization must make tough decisions, and the health of those decisions is entirely contingent upon the use of reliable, accurate data.

While many companies have invested in various procurement systems in an effort to improve their spend analysis practices, such systems often fail to address the most pressing need of all – to create and implement the best possible sourcing strategies.

To successfully exercise control over corporate spending, companies need:

  • Accurate information about items purchased, including supplier quality, timeliness, performance, price and technological advancement.
  • A method to rank suppliers based on the criteria most important to them.
  • Flexible, business-focused strategies designed to deliver continual cost savings.

About the About: Daniel Teachey is director of corporate communications with DataFlux Corporation, a provider of end-to-end data quality integration solutions to analyze, improve and control data. Previously Teachey held positions with IBM, MicroMass Communications and Datastream Systems. DataFlux is a wholly owned subsidiary of SAS. More information on DataFlux is available at