I've seen image decoders like tf.image.decode_png in TensorFlow, but how about reading audio files (WAV, Ogg, MP3, etc.)? Is it possible without TFRecord?
E.g. something like this:
filename_queue = tf.train.string_input_producer(['my-audio.ogg'])
reader = tf.WholeFileReader()
key, value = reader.read(filename_queue)
my_audio = tf.audio.decode_ogg(value)
In TensorFlow IO, class tfio.audio.AudioIOTensor allows you to read an audio file into a lazy-loaded IOTensor: In the above example, the Flac file brooklyn.flac is from a publicly accessible audio clip in google cloud.
The answer from @sygi is unfortunately not supported in TensorFlow 2.x. An alternative solution would be to use some external library (e.g. pydub or librosa) to implement the mp3 decoding step, and integrate it in the pipeline through the use of tf.py_function.
TensorFlow also has additional support for audio data preparation and augmentation to help with your own audio-based projects. Consider using the librosa library—a Python package for music and audio analysis.
If you want to choose a format between Ogg and MP3, it depends on how you’re going to use the audio file. If you want to keep the file size small, then both Ogg and MP3 can meet your need. There’re some areas that Ogg is better at like the sound quality, open-source characteristics and changeable bit rate.
Yes, there are special decoders, in the package tensorflow.contrib.ffmpeg. To use it, you need to install ffmpeg first.
Example:
audio_binary = tf.read_file('song.mp3')
waveform = tf.contrib.ffmpeg.decode_audio(audio_binary, file_format='mp3', samples_per_second=44100, channel_count=2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With