Systematic Prosody Variance (SPV)

An invented term that means listening for the rhythm of speech. One of the more interesting ways to use it is to watch how silence works; for a speech recognizer silence can be fatal. The recognizer listens for words in a context, ex. "Would you like, coffee, tea, or milk?" and the recognizer blows a fuse if you dont hit the answer in , like a natural voice rhythm of say, within 3 seconds after the question. Most recognizers carry that kind of trick to them, that if you use a natural, steady rhythm with them, they work better. A cool trick I used to use, in fact, was to turn recognition off for the first 2 seconds of a prompt. This avoids the noise relating to picking up the phone, or things going on in the room during call connection.

Speech recognizers (both directed dialogue and the continuous ones) started out life with a person somewhere just grinding through stacks of stuff they had to read and someone else being very careful about how it was all recorded. A very, very cool one was the "wall street journal database" , built by the emminent DARPA Scientist Douglas Paul, and others - by people simply reading the Wall Street Journal. I have always realized that these databases were built with a certain feel to them in addition to the fact that they also represent the total number of sounds any person could ever make and make a mathematical covering of the sounds we can generate.
Dr. Paul, I am sure, would say that frequency based recognition has its problems. What I have always wanted to do is to try to understand why people can speak to each other without even talking. Its a human effort. After all, I assume you've noticed that there isn't any formal linguistics training courses out there for two year olds?

One of the nicest things that happens around little ones is that somehow, just by the way they talk - they cheer you up. Ralph Waldo Emerson once wrote "A child of four, commonly makes a group of adults, the same number, of the same age."

One of the reasons why we're good at language is because its innate. I find that this relates to our status as prey animals who must use language for defensive purposes. This is in the era before the BFG of course. We are too easily found, our hearing easily outwitted by animal ears - in the days of the sabertooth tiger how we survived had largely to do in part with a complicted but intuitive system of calls and cries for distant early warning. The system of sounds likely had no syntax or grammar but did communicate location, speed, distance, and whether or not we are friend or foe. This system is alive and well in little ones and in us at all times, and it tells us whether or not to care. Break it and watch how you become the ever so polite "I'm talking to a cripple" person with something better to do in your day. And why people who survive stroke more than anyone need the ability to regain their speech faculty first - they can and will regenerate their brain tissue but they first have to convince +you+ that they're still the same. The prey animal instinct to sacrifice one from the pack for the greater good will kick in against them.

To try to get at this, with people, sometimes I would cover my mouth with my hand and then vary pitch and frequency up as I would say almost nothing intelligible, just walk it up the scale. "bla bla bla" , and higher and a bit sharper "bla bla" , etc. Vary the number of words so that you dont fix on any one message. But keep the rhythm of the conversation. People always pricked up their ears. When they did I just pointed it out. This was just random noise, like a jackhammer in the street, or a really loud prairie dog chirping away on the mesa? Why bother. The active listening kicks in.

Prosody curving added to speech recognizer, make it very easy to blow a fuse. The recognizers that you love to hate (like 1 800 555 1212 ) already have a tough enough time trying to understand energy - like slamming a door while talking to someone - it can exclude sound force that is out of range, but for frequency, (you know, pitch, how high or low someones voice is) the speech recognizer is usually tuned to ignore it. Thats why speech reco works for both men and women equally well. Its a nice trick. It works. One of the reasons we have these systems that are now working semi-well. I have always loved the telephone systems over all the others.

I invented the term systematic prosody variance (SPV) for a phase one proposal I did because when prosody varies, there are set patterns to it (for example, if someone's tired, they might more or less deal in the same kind of delay to their speech, etc.) But I always hated the term. It doesn't really describe what I'm trying to find. It sounds like a gas guzzling over grown station wagon. Not a truck damnit, its an SPV. With a sticker on it that says "Soccer Mom". I hate it.

And I know you care so. much. (riight) so thats why I am asking for your help. If you can think of a TLA that refers to prosody sort of as an element of how we talk or communicate. I swear to god. I will do any. thing. for you. I mean. any. thing. help. .. please. help. .

Search This Blog

Colorless Green Ideas

Systematic Prosody Variance (SPV)

Comments