Dial Tone

Every conversation, as Michael Cohen says, is a transaction with a hidden set of mutual expectations. These transactions consist of loosely written rules. They are, however, rules. You can find them. In general, the best way to understand them without a computer assist is to remember they won't be on the X-Y axis zeroed at the origin. They will change.

One of the easiest ones to think about is inflection. Lets say the rule is, that you inflect the question. "Would you like a coke?". Is a good example. The word coke inflects up. Can you think of a question, when the inflection goes down?

If you've called one of those Automated Speech Recognition systems, you've probably noticed that they're sensitive to background noise, and they fail. An ambitious system was installed by South Pacific Railways, Inc. National Telecommunications company. A company that started off running fiber optic alongside their railroad lines. This company took the daring approach of compiled a statistical language model of every possible customer conversation, recorded for "quality control purposes" and the interface is totally open. The speech recognizer just says, "How May I Help You?".

Dialogue Design in these speech recognition systems assiduously avoided this kind of approach, and had some limited success. When you build these things you have to wonder - how is it really going to be used? And if you can just make one persons day better, they're a success. So where did they fail? Was it in the ambition of it all... who wants to keep you in one of these things?

Well. Banks and Telecoms want to be able to cut their call center staff. Call Center Agents usually have a respectable turnover rate and are much more expensive than an automated call. Average cost of an automated call is about 24 cents. And for the agent handled call, about 75 cents to a dollar. So there's this huge incentive to cut costs by designing them so that you can't drop to operator.

And that cuts across the grain of expectactions and blows the transaction, in alot of cases. But there's another thing going on.

Conversations, when they start - have a flow to them. People begin to communicate on several levels. I've noticed when people relax and start talking - the love and happiness (something that can make you do wrong, make you do right , as Al Green says) ride our prosody like a carrier wave. But in a computer automated call, tone is anti-statistical.

The way it works is when the system gets a blast of air from you (your speech) they can try to figure out the pieces of it ( called phonemes) by looking at frequency (pitch) or energy. Now, if you look at pitch , males and females talk at fundamentally different levels. So alot of times the frequency based recognizers didn't work quite right. But if you look at the energy spectrum, you can get a limited shot at getting it right. The best you can hope for is maybe 30 percent likely scoring on any one phoneme group.

So you've seen those speech recognition systems that type letters for you, where you talk into them and they keep making mistakes? Thats because they're really working on just matching that one set of phonemes in any word. They'll do maybe 90 percent accurate at best.

But the trick is to build in a limited set of responses . "Coffee, Tea or Milk" and so the search space goes way down and the confusion with it. And in those systems you can get great response. The energy based stuff comes off fine, even handles background noise. One trick of the trade is to to turn off recognition right at the onset of the conversation (1, 2 seconds) to avoid telephone noise, picking the receiver up, scratch noises from razor stubble, doors slamming in the background. Its a procrustean solution.

But it works. At least in a limited way. So naturally what do American companies do with it? When they have some limited success. Doesn't 911 teach you everything you need to know? They go over the top.

Bank systems aren't designed to interact with you, mostly. They're designed to keep you in the call. I have noticed that the latest designs will drop you back to tone based IVR. Whether times are good or bad , happy or sad. They drop you back into the system and you notice it.

So part of this is really the tone of the conversation. In my view there are so many numbers to crunch as part of modern life that we're always looking for a way to sort of lean on other entities to do it. Companies have vaulted their way up the food chain and there's a lot of deep baseball going on with them and their customers. Joe six pack has figured out by now that corporate lobbyists are in control of an unprecedented number of american institutions. So he's really looking for his branding to be a customer value management system.

That tone of the conversation is impossible at present, when you dial up an Automated Speech Recognizer. It is useful and good technology, and it can front end the call and handle simple sorting. Keep them in dialogue for just 10 seconds or so. I think thats fair. The greatest evil is that done by a man, better done by a Machine. (Aristotle) . Baby steps. Just like when the operators (Like Lily Tomlin! Snort!) would switch the calls manually with patch cables. Replace the simple things, make it easier to use technology. Like the iPhone. or the Helio (a good competitor). Something useful, and designed to be used easily. Simple tasks. Just the introduction or the front end then switch everything to where it needs to go.

One day in about five or ten years or so, dial tone will go away and be replaced by speech recognition. You will pick up your phone and it will ask you what you want to do.


Anonymous said…
yes this is the thing that is on the wall at .. what was that companys name.. big one in palo alto.. tellme?