Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert pyaudio frames into wav format without writing to a file?

I want to implement simple speech-to-text tool using pyaudio and IBM Bluemix service. Currently i need to record audio, save it to disk and then load again in order to send it to Bluemix.

RATE=44100
RECORD_SECONDS = 10
CHUNKSIZE = 1024

# initialize portaudio
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=RATE,     input=True, frames_per_buffer=CHUNKSIZE)

frames = [] # A python-list of chunks(numpy.ndarray)
print("Please speak!")

for _ in range(0, int(RATE / CHUNKSIZE * RECORD_SECONDS)):
    data = stream.read(CHUNKSIZE)
    frames.append(np.fromstring(data, dtype=np.int16))

#Convert the list of numpy-arrays into a 1D array (column-wise)
numpydata = np.hstack(frames)

# close stream
stream.stop_stream()
stream.close()
p.terminate()

# save audio to disk
wav.write('out.wav',RATE,numpydata)

# Open audio file(.wav) in wave format 
audio = open('/home/dolorousrtur/Documents/Projects/Capstone/out.wav', 'rb') 

# send audio to bluemix service
headers={'Content-Type': 'audio/wav'} 
r = requests.post(url, data=audio, headers=headers, auth=(username, password)) 

How can I convert pyaudio frames into wav format without writing them to disk?

like image 721
Arthur Grigorev Avatar asked Sep 23 '17 13:09

Arthur Grigorev


People also ask

What is PyAudio format?

PyAudio provides Python bindings for PortAudio, the cross-platform audio I/O library. With PyAudio, you can easily use Python to play and record audio on a variety of platforms.

What are chunks samples and frames when using PyAudio?

"RATE" is the number of samples collected per second. "CHUNK" is the number of frames in the buffer. Each frame will have 2 samples as "CHANNELS=2". Size of each sample is 2 bytes, calculated using the function: pyaudio.


1 Answers

Here's an example that worked for me. If you put the recorded audio into a speech_recognition AudioData object, there are methods available for converting to various audio formats (e.g., get_wav_data(), get_aiff_data(), get_flac_data(), etc.). See here: speech_recognition AudioData

import pyaudio
import speech_recognition
from time import sleep


class Recorder():

    sampling_rate = 44100
    num_channels = 2
    sample_width = 4 # The width of each sample in bytes. Each group of ``sample_width`` bytes represents a single audio sample. 

    def pyaudio_stream_callback(self, in_data, frame_count, time_info, status):
        self.raw_audio_bytes_array.extend(in_data)
        return (in_data, pyaudio.paContinue)

    def start_recording(self):

        self.raw_audio_bytes_array = bytearray()

        pa = pyaudio.PyAudio()
        self.pyaudio_stream = pa.open(format=pyaudio.paInt16,
                                      channels=self.num_channels,
                                      rate=self.sampling_rate,
                                      input=True,
                                      stream_callback=self.pyaudio_stream_callback)

        self.pyaudio_stream.start_stream()

    def stop_recording(self):

        self.pyaudio_stream.stop_stream()
        self.pyaudio_stream.close()

        speech_recognition_audio_data = speech_recognition.AudioData(self.raw_audio_bytes_array,
                                                                     self.sampling_rate,
                                                                     self.sample_width)
        return speech_recognition_audio_data


if __name__ == '__main__':

    recorder = Recorder()

    # start recording
    recorder.start_recording()

    # say something interesting...
    sleep(3)

    # stop recording
    speech_recognition_audio_data = recorder.stop_recording()

    # convert the audio represented by the ``AudioData`` object to
    # a byte string representing the contents of a WAV file
    wav_data = speech_recognition_audio_data.get_wav_data()
like image 160
Adrian Pope Avatar answered Oct 01 '22 06:10

Adrian Pope