I want to implement simple speech-to-text tool using pyaudio and IBM Bluemix service. Currently i need to record audio, save it to disk and then load again in order to send it to Bluemix.
RATE=44100
RECORD_SECONDS = 10
CHUNKSIZE = 1024
# initialize portaudio
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=RATE, input=True, frames_per_buffer=CHUNKSIZE)
frames = [] # A python-list of chunks(numpy.ndarray)
print("Please speak!")
for _ in range(0, int(RATE / CHUNKSIZE * RECORD_SECONDS)):
data = stream.read(CHUNKSIZE)
frames.append(np.fromstring(data, dtype=np.int16))
#Convert the list of numpy-arrays into a 1D array (column-wise)
numpydata = np.hstack(frames)
# close stream
stream.stop_stream()
stream.close()
p.terminate()
# save audio to disk
wav.write('out.wav',RATE,numpydata)
# Open audio file(.wav) in wave format
audio = open('/home/dolorousrtur/Documents/Projects/Capstone/out.wav', 'rb')
# send audio to bluemix service
headers={'Content-Type': 'audio/wav'}
r = requests.post(url, data=audio, headers=headers, auth=(username, password))
How can I convert pyaudio frames into wav format without writing them to disk?
PyAudio provides Python bindings for PortAudio, the cross-platform audio I/O library. With PyAudio, you can easily use Python to play and record audio on a variety of platforms.
"RATE" is the number of samples collected per second. "CHUNK" is the number of frames in the buffer. Each frame will have 2 samples as "CHANNELS=2". Size of each sample is 2 bytes, calculated using the function: pyaudio.
Here's an example that worked for me. If you put the recorded audio into a speech_recognition
AudioData
object, there are methods available for converting to various audio formats (e.g., get_wav_data()
, get_aiff_data()
, get_flac_data()
, etc.). See here: speech_recognition AudioData
import pyaudio
import speech_recognition
from time import sleep
class Recorder():
sampling_rate = 44100
num_channels = 2
sample_width = 4 # The width of each sample in bytes. Each group of ``sample_width`` bytes represents a single audio sample.
def pyaudio_stream_callback(self, in_data, frame_count, time_info, status):
self.raw_audio_bytes_array.extend(in_data)
return (in_data, pyaudio.paContinue)
def start_recording(self):
self.raw_audio_bytes_array = bytearray()
pa = pyaudio.PyAudio()
self.pyaudio_stream = pa.open(format=pyaudio.paInt16,
channels=self.num_channels,
rate=self.sampling_rate,
input=True,
stream_callback=self.pyaudio_stream_callback)
self.pyaudio_stream.start_stream()
def stop_recording(self):
self.pyaudio_stream.stop_stream()
self.pyaudio_stream.close()
speech_recognition_audio_data = speech_recognition.AudioData(self.raw_audio_bytes_array,
self.sampling_rate,
self.sample_width)
return speech_recognition_audio_data
if __name__ == '__main__':
recorder = Recorder()
# start recording
recorder.start_recording()
# say something interesting...
sleep(3)
# stop recording
speech_recognition_audio_data = recorder.stop_recording()
# convert the audio represented by the ``AudioData`` object to
# a byte string representing the contents of a WAV file
wav_data = speech_recognition_audio_data.get_wav_data()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With