I remember reading InfoWorlds interview with Marissa Mayer about how the goal of the service is to build a training set for speech models. She states:
This is such a brilliant idea. I really can't think of a better way to get a huge group of people to help associate the words they pronounce with its text equivalent at such a low cost (to Google). But I also wondered about the bias they would need to deal with in the training set. When I use the service I do find myself pronouncing the words more clearly, also the distribution of phonemes in place and business names might be significantly different from common words. I suspect that these are minor issues that are mitigated as the data set grows larger.You may have heard about our [directory assistance] 1-800-GOOG-411 service. Whether or not free-411 is a profitable business unto itself is yet to be seen. I myself am somewhat skeptical. The reason we really did it is because we need to build a great speech-to-text model ... that we can use for all kinds of different things, including video search.
The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that. ... So 1-800-GOOG-411 is about that: Getting a bunch of different speech samples so that when you call up or we're trying to get the voice out of video, we can do it with high accuracy.