Voice in a Multicultural DC/Warehouse

A sound solution for building productivity and performance

Sep 28, 2009

Tom Upshur is vice president of product management and marketing with Vocollect

By Tom Upshur

Today's distribution center (DC) and warehouse labor landscape is infinitely different from that of previous generations. Nowhere is change more evident than in the proliferation of different native languages being spoken within an individual distribution center. With the continuing globalization of business, multinational companies face a variety of challenges in managing their supply chain across multiple languages and countries.

According to a March 2009 report from the U.S. Bureau of Labor Statistics, in 2008, 24.1 million persons, or 15.6 percent of the U.S. civilian labor force age 16 and over, were foreign-born. The study found that foreign-born workers are more likely than their native-born counterparts to be employed in production, transportation and material-moving occupations (16.4 versus 11.5 percent). This reality has enormous implications for the world of logistics, validating what most distribution center and logistics leaders have known for some time.

Technology has come a long way in improving productivity and performance in the DC/warehouse. While there are many different types of technology applied, one solution in particular has a special niche to play in addressing this multi-language aspect of human beings working together — voice. This article will examine the merits of voice in the DC for supporting multiple native languages.

Voice, Defined

But first, what exactly is voice, and how does it work? In a voice-directed DC/warehouse, employee assignments are sent via Wi-Fi from a warehouse management system (WMS) to a lightweight, battery-powered computing device worn or held by the worker. Once received by the device, the work assignments are converted into a series of discrete verbal commands, which the worker hears through a headset. The instructions direct the employee to an aisle/section and slot location. Once there, the employee confirms he or she is at the proper location and completes the task by speaking into the headset. The worker's words are recognized by the speech recognition software running on the device, which translates the spoken response into data and sends those data back to the WMS. The WMS issues the next assignment and the process repeats itself.

By replacing labor-intensive, error-prone systems such as paper, screens, scanners and keyboards for data entry, voice has been shown to increase productivity, accuracy and job satisfaction — all resulting in improved business performance. Voice helps front-line employees succeed at their jobs by leveraging the most human approach to communication, a two-way dialogue. Literally talking their way through their daily tasks, workers achieve greater productivity, accuracy and safety — the keys to improved job satisfaction.

Voice in Multiple Languages

While it would be technically possible to have as many as 10 or more languages running simultaneously in a given DC/warehouse, those companies with extremely multicultural workforces often tend to limit the number of languages spoken on the voice system to three or four to avoid encouraging cliques based on country of origin.

In most cases, U.S. companies choose to provide the instructions in English and allow their people to answer back in their native tongue. But if management wanted to, instruction could come in an employee's "mother tongue," and the response back could be in the mother tongue, too. Or, every bit of two-way dialogue could be in English. It is all a matter of how the company wishes to conduct its business.

Leveling the Playing Field

Today voice is used daily by hundreds of thousands of workers around the globe to drive business results. Closer to home, as borne out by the above U.S. Department of Labor Statistics report, many distribution operations are challenged by a multi-language workforce. Those using voice have found a strong benefit for their multicultural workforce: it gives all of their workers a level playing field.

According to Robby Dhesi, director of distribution with Fox Racing Inc., a leading sport apparel manufacturer based in Morgan Hill, Calif.: "Regardless of today's tough economy, it's important to consider quality of work life. For our non-English speaking DC workers, using voice helps to improve their quality of work life, because being able to answer back to the system in the language they are most comfortable with makes them feel more in control of their job; thus, it is easier to perform well." Fox Racing worked with HighJump Software and its authorized business partner Vitech Business Group, Inc. to optimize its warehouse management system (WMS), and with my company, Vocollect, to offer a voice solution.

For organizations with a multicultural workforce, voice provides distinct competitive advantages:

Better productivity, because workers adapt quickly and comfortably to the new system. And in the case of a speaker-dependent system, the system itself adapts to them.
Less training cost and time — DC/warehouse workers, including casual workers, can be trained in less than an hour and be working productively within a day. This is particularly critical for seasonal workers and retail distribution, where high levels of temporary staff are utilized.
Familiar, two-way communication using natural language patterns improves efficiency, because voice helps all workers perform their best, regardless of what language they speak.
Workers feel that their company leadership has positioned them to best succeed in their position. This boosts their personal job satisfaction and helps the organization achieve longer-term employee retention.

Even as a new immigrant speaking minimal English, non-native workers have every opportunity to reap the same rewards in company incentive programs, because language isn't a barrier. The mobile device is automatically set to recognize individual accents and regional dialects. Workers can "train" the software to understand their chosen vocabulary and preferred colloquialisms, which provides a more natural work environment.

Obviously, it takes quite sophisticated management software to allow a company to manage multiple languages in the DC. For example, Vocollect's fully automated management software has loaded on it the voice templates of the correct text-to-speech (TTS) engines (speech-out) and the task (e.g., picking, replenishment, put-away), as well as each individual worker's voice templates. The individual worker does a one-time recording (training) of voice templates. So whether the voice system is in Peoria, Paris or Tokyo, the voice application software supports the process.

Says Dhesi: "Many times you have a supervisor who oversees both English-speaking and non-English-speaking teammates. With the newest versions of management software, supervisors can actually see the dialogue words on the screen so they can follow what the system is telling the employee to do and how he or she is responding for training purposes."

Smith Drug Company, headquartered in Spartanburg, S.C., is a full-line, full-service distributor of pharmaceuticals and over-the-counter merchandise, serving customers in more than 15 states. In the company's main DC are workers of Russian, Asian and Latin American descent, in addition to those who are American-born. Smith Drug allows workers on the voice system to receive the instructions in English, and they can respond back in their preferred native language. Says Director of Information Systems Randy McConnell: "We haven't had any performance challenges because of language skills with our 12 or so foreign-born workers. The voice system helps them be just as effective at picking as anyone else."

Continues McConnell: "In our DC operations, in the hiring process, nationality is no longer an issue; if their background fits the job, they can be a strong performer here, whether or not they can speak good English, as long as they understand basic warehousing terms in English. The voice system enables us to hire people who might otherwise have employment challenges simply because of a lack of proficiency in English."

In fact, says McConnell, there is one American-born worker on his team who is in the process of learning Spanish, so she uses a lot of Spanish words in the voice system to help her learn better Spanish.

Voice, Deconstructed

In order to facilitate multiple languages within the voice system, two aspects of language must be considered: the user recognizing the system's speech ("speech-out") and the system recognizing the user's speech ("speech-in"). Speech-out can be performed either by recorded speech or a TTS (text-to-speech) engine. The TTS engines are language-specific, so an important consideration when evaluating voice suppliers is to make certain they offer the speech-out languages needed for a particular DC operation. In addition, for an application to perform speech-out in a language, the prompts in the application/task must be translated to the language.

The speech-in capability of a voice system is typically structured in either speaker-dependent or speaker-independent modes. Both include all the standard vocabulary in a distribution environment, such as "pick," "go to aisle…" and the like.

In a highly multicultural environment such as those we see today across the European Union, where three or more native languages are commonly spoken in warehouses via a voice system, many companies have found that a speaker-dependent voice system is best suited to account for the many languages, dialects, accents and colloquialisms that people use in the course of their work.

A typical warehouse application requires a vocabulary of fewer than 100 words and sometimes as few as 40 words. A small-vocabulary recognizer, such as that needed for a DC/warehouse, can utilize a computer's perfect memory to improve performance dramatically for every user, even those with unusual speech patterns or strong accents. It can store each user's unique voice patterns for every word the recognizer will be required to understand. Although Employee #123 will not say "one" in exactly the same way every time, if the recognizer knows that Employee #123 is speaking, and has access to her personal voice patterns, it will be able to transcribe her speech much more accurately than if it tried to compare her speech to all ways of saying "one" or to an average of how most people say it.

Thus, the system's accuracy depends on its knowing who is talking to it. A speaker-independent system does not make use of that knowledge and is therefore inherently less accurate and much more susceptible to background noise. Further, it often requires the use of "anchor words" — additional words that serve as cues to the voice system to begin or conclude a dialogue. Every word that is added to the mix takes time, and, of course, time is money.

The process of allowing a speech recognizer to "practice" with a user's speech patterns is called training. Speaker-dependent recognizers are, therefore, sometimes called trained systems, and speaker-independent recognizers are referred to as untrained systems.

A speaker-dependent recognizer can typically be trained in 15 to 45 minutes. As DC workers collectively typically say these words thousands of time per day, the increased recognition accuracy of a speaker-dependent system over a speaker-independent system results in higher productivity. This higher productivity rapidly pays back the initial time spent to train the recognizer. And the increase in productivity is highest for non-native speakers because, for them, recognition accuracy on speaker-independent systems is often much worse than it is for native speakers.

A speaker-independent system cannot just accept "uno" in place of the word "one." Users must conform to the system's expectations of the speech patterns of the language they are using. For the workforce on a factory or warehouse floor where a wide variety of accents and languages are the norm, this may not be practical. In a speaker-dependent voice system, when prompted with "one," the worker is free to train the word as "one," "uno," "um," "eins," or any other preferred word for one.

Adaptive recognition further increases recognition accuracy, and thus productivity, by automatically adjusting to the way the user speaks. When a user starts using any new system, he or she may be tentative and unsure, and this may affect speaking patterns. This may be particularly true for non-native speakers. As they become more familiar with the system, their speaking patterns may change, becoming quicker and more natural. Some voice suppliers, such as Vocollect, incorporate an adaptive recognition feature that automatically senses and adjusts to these changes while the user uses the system, further improving recognition accuracy and productivity.

Having to continually repeat dialogue unnecessarily because of poor recognition accuracy slows down productivity all across the line, and that can spell significant financial cost for supply chain businesses. When speech recognition accuracy is maximized, there are significantly fewer repeats. This helps to ensure the company achieves the strongest level of business performance from the voice system.

Only the Beginning

The full potential of voice across the supply chain and beyond is yet to be realized. Vocollect, for example, has already made strong inroads into utilizing voice in healthcare settings through its Vocollect Healthcare Systems. Its voice-assisted care solution is used by nurse aides for the management and documentation of patient care in skilled nursing facilities (SNFs). Organizations benefit from reduced paper processing, lower operating costs, increased reimbursements and improved quality of care.

Considering the many other types of mobile workers there are in businesses beyond the supply chain, one can picture that voice-directed efficiencies have a bright future in helping unleash higher business performance. The proven capacity of voice to operate successfully in a multi-lingual environment is a critical component of the ability to bring voice into other realms of business operation.

About the Author: Tom Upshur is vice president of product management and marketing with Vocollect, Inc., a provider of voice solutions for mobile workers. More information at www.vocollect.com.