Open Source Software For Transcribing Speech in Audio Files

Tags:

Can anyone recommend reliable open source software for transcribing English speech in wav files? The two main programs I've researched are Sphinx and Julius, but I've never been able to get either to work, and the documentation with each on transcribing files is sketchy at best.

I'm developing on 64-bit Ubuntu 10.04, whose repos include sphinx2 and julius, as well as voxforge's julius acoustic modal for English. I'm focussing on transcribing files, instead of directly processing sound from a mic, because I've given up on expecting projects like these to work with Ubuntu's sound system. This isn't a knock against Ubuntu, as I can record sound with my mic perfectly using Audacity, but neither system seems able to access my mic, so I'm hoping I can simply their configuration by just reading from a file.

I first tried Sphinx2, from the Ubuntu package sphinx2-bin. Even though the sample sphinx2-demo seemed to work on transcribing a file, there's virtually no documentation on the configuration, so I'm not sure how I'd customize this to read from an arbitrary wav. The audio file used in the demo is in some undocumented "16k" format, which is indirectly referenced through 2 configuration files. There's a brief blurb describing sphinx2-demo as running sphinx2-batch, but inspecting the script shows it's actually calling sphinx2-continuous. Even worse, the --help docs for each script list about 6 dozen options, and doesn't mention which are required or optional. Overall, the lack of sphinx documentation, and the poor quality of existing documentation is driving me nuts.

I next tried Julius, again from the Ubuntu package, which was surprisingly recent (4.1), considering the version used in Voxforge's quickstart is 3.5. The package seems to include slightly better documentation, and even an example written in Python (/usr/share/doc/julius-voxforge/examples/controlapp). After reading the example's docs, I tried adapting it to read from a file by creating a file filelist.txt containing the text "hello.wav" referring to a file of the same name, containing a recording of someone saying "hello". Placing these in the same directory, I ran:

julius -input file -filelist filelist.txt -C julian.jconf

getting the response:

### read waveform input
Error: adin_file: sampling rate != 16000 (8000)
Error: adin_file: error in parsing wav header at hello.wav
Error: adin_file: failed to read speech data: "hello.wav"
0 files processed

Retrying by specifying absolute filenames for filelist.txt and hello.wav produce the same error.

I also tried the Julius call used in the example, to record directly from a mic:

julius -input mic -C julian.jconf

I called this several times, and the response varied between the error:

Cannot read /dev/dsp

and:

STAT: AD-in thread created
<<< please speak >>>

In the later case, no matter what I say into the mic, nothing happens. I can't tell if it's still unable to read the mic, or if it's reading something, but is simply unable to transcribe the audio.

I'm not sure what to make of this. The errors I'm getting don't leave me with much to go on. Why can't it read a wav? Why can't it read /dev/dsp? Why does it then appear to be able to read /dev/dsp, but not react in any way?

Has anyone else had any success with open source speech recognizers, especially on Linux?

735

asked Sep 30 '11 16:09

Cerin

1 Answers

Why can't it read a wav?

It tells you that the file has wrong sampling rate (8000) instead of requested (16000). Sampling rate is very important for speech recognition software.

Why can't it read /dev/dsp?

In recent versions of Ubuntu pulseaudio framework is used instead of OSS. The version you are trying is using OSS so you need to install oss-compatibility package from your distribution to bring OSS support back.

You can try newer Julius which has pulseaudio support

Why does it then appear to be able to read /dev/dsp, but not react in any way?

Audio input doesn't work properly.

Has anyone else had any success with open source speech recognizers, especially on Linux?

Sure, check this video as an example of what people do with CMUSphinx:

http://www.youtube.com/watch?v=vfaNLIowSyk

I suggest you to revisit CMUSphinx package which is a leading open source speech recognition engine. There are loads of documents on the website, you just need to read them. Remember that speech recognition is a complex area where you can get a great results but you also need to invest your time in understanding the technology. Just like with any other domain.

In short, to transcribe a file with CMUSPhinx you need to do the following 3 simple steps:

Take wav file and resample it to 8khz 16 bit mono file with sox:

    sox input.wav -r 8000 -c 1 resampled.wav

Install pocketsphinx 0.7

   apt-get install pocketsphinx

Decode the file

    pocketsphinx_continuous -samprate 8000 -infile resampled.wav

The result will be printed to standard output. To supress the logger, add stderr redirection to /dev/null

    pocketsphinx_continuous -infile resampled.wav 2> /dev/null

141

answered Sep 28 '22 11:09

Nikolay Shmyrev

Related questions
                            
                                Config file for holding connection string parameters in Java
                            
                                Suppress Compiler Warnings in JUnit Tests
                            
                                JVM and private methods
                            
                                immutability of a class when an instance variable present as arraylist
                            
                                Hibernate Pagination using HQL
                            
                                Is there a way to change font color of Jradiobuttons using the UImanager
                            
                                How to get a double-clicked TreeTableNode?
                            
                                Accepting certificates in Java
                            
                                What are "min" and "max" in this function to check if a binary tree is a valid BST?
                            
                                HTTP Delete with Request Body issues
                            
                                Initialization hook for Clojure Noir WAR/Servlet (CloudFoundry)
                            
                                Type conversion and method overloading
                            
                                Acquiring a country's currency code
                            
                                OOP Design for Card Game Classes
                            
                                Is there a JSR-330 equivalent of Spring's @Value annotation?
                            
                                How i can i get header from response as ServletResponse().getHeader(string) is not visible?
                            
                                how to get all substring for a given regex?
                            
                                Nested Java 8 parallel forEach loop perform poor. Is this behavior expected?
                            
                                Java: A synchronized method in the superclass acquires the same lock as one in the subclass, right?
                            
                                Are there APIs for text analysis/mining in Java? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Open Source Software For Transcribing Speech in Audio Files

Tags:

java

python

speech-recognition

speech-to-text

cmusphinx

Cerin

People also ask

1 Answers

Nikolay Shmyrev

Recent Activity

Donate For Us