This is such a brilliant idea. I really can't think of a better way to get a huge group of people to help associate the words they pronounce with its text equivalent at such a low cost (to Google). But I also wondered about the bias they would need to deal with in the training set. When I use the service I do find myself pronouncing the words more clearly, also the distribution of phonemes in place and business names might be significantly different from common words. I suspect that these are minor issues that are mitigated as the data set grows larger.You may have heard about our [directory assistance] 1-800-GOOG-411 service. Whether or not free-411 is a profitable business unto itself is yet to be seen. I myself am somewhat skeptical. The reason we really did it is because we need to build a great speech-to-text model ... that we can use for all kinds of different things, including video search.
The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that. ... So 1-800-GOOG-411 is about that: Getting a bunch of different speech samples so that when you call up or we're trying to get the voice out of video, we can do it with high accuracy.
16 September 2008
GOOG-411
I really like the GOOG-411 service and have come to use it instead of looking up numbers online. However, today my daughters unintentionally launch a local denial of service attack against me as I used it. I called to place an order at our local zpizza (yes, just like their demo video ). As it was giving me my one selection my oldest was talking to her sister about wanting "to go back to ballet", upon which the service went back to the top menu. I didn't pick up on this at first and was completely confused as to why the system was jumping around without me saying anything, but eventually stepped out of the car into the quite outdoors to complete my order.
I remember reading InfoWorlds interview with Marissa Mayer about how the goal of the service is to build a training set for speech models. She states:
Subscribe to:
Post Comments (Atom)

0 comments:
Post a Comment