So to get a reply from Google, we have to send an audio file as an HTTP packet that requests this page: Installing Flac: This is necessary as the Google speech API v2 requires the content to be sent as flac data. Hence, in order to use them, a Python wrapper is needed. Installing PyAudio: PyAudio is the Python bindings for PortAudio. In order to link PortAudio, some folders need to be given write permission. Installing PortAudio: This is the software that takes care of the OS and creates a wrapper around the native audio APIs in order to expose a unified API. Although that is not strictly necessary as the code works with recorded wav files as well. One of the coolest thing is that one can use the microphone to capture the audio stream and get it parsed. It turned out that it is a wrapper for four online engines (Google, Wit.ai, IBM and AT&T) that process the audio and returns the deciphered text to the code. I thought that it had a speech processing algorithm of its own. One of my colleagues, Ravish Verma, handed me a link to speech recognition code.
0 Comments
Leave a Reply. |