Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I search content, within audio files/streams? [closed]

Tags:

I have always wondered how many different search techniques existed, for searching text, for searching images and even for videos.

However, I have never come across a solution that searched for content within audio files.

For example: Let us assume that I have about 200 podcasts downloaded to my PC in the form of mp3, wav and ogg files. They are all named generically say podcast1.mp3, podcast2.mp3, etc. So, it is not possible to know what the content is, without actually hearing them. Lets say that, I am interested in finding out, which the podcasts talk about 'game programming'. I want the results to be shown as:

  • Podcast1.mp3 - 3 result(s) at time index(es) - 0:16:21, 0:43:45, 1:12:31
  • Podcast21.ogg - 1 result(s) at time index(es) - 0:12:01

So my questions:

  • How could one approach this problem?
  • Are there are suitable algorithms developed to do something like this?

One idea the cropped up in my mind was that, one could use a 'speech-to-text' software to get transcripts along with time indexes for each of the audio files, then parse the transcript to get the output.

I was considering this as one of my hobby projects. Thanks!

like image 606
Pascal Avatar asked Aug 22 '08 21:08

Pascal


1 Answers

If you want to search for text (i.e. what is being said) inside an audio stream you would have to process it with some kind of speech recognition algorithm and store the text as meta data associated with the files. For video you could also do text recognition for text inside the video. Evernote already does this for text inside image files, but has no support for audio as far as I know.

Something similar is possible when using audio to search for audio. I don't know the details of these algorithms, but I'm guessing they involve some kind of frequency analysis. Shazam is using this kind of technology to identify songs based on audio clips.

Here are some Wikipedia articles that may be useful:

  • Speech recognition
  • Fast Fourier transform
  • Frequency analysis (frequency spectrum)
  • Optical character recognition (OCR)
like image 150
Anders Sandvig Avatar answered Oct 07 '22 00:10

Anders Sandvig