Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split an audio file into pieces of an arbitrary size

I have a large sound file (150 MB) that I would like to split into smaller files of some more easily managed size, say, files with 5 minutes of audio. Clearly, the last segment is going to be <= 5 minutes, and that's OK. Is there a way to do this sort of task easily?

A small sample .mp3 file to be used for this problem can be downloaded using this link: download.linnrecords.com/test/mp3/recit.aspx.

Here is what I have tried so far. I imported the data using readMP3 from tuneR and was going to use the cutw function, but haven't found an efficient way of using it.

library(tuneR)

sample<-readMP3("recit.mp3") 

# the file is only 9.04 seconds long (44.1 Hz, 16-bit, sterio)
# so, for this example we can cut it into 0.5 second intervals)
subsamp1<-cutw(sample, from=0, to=0.5, output="Wave")

# then I would have to do this for each interval up to:
subsampn<-cutw(sample, from=9, to=9.04, output="Wave") 
# where I have to explicitly state the maximum second (i.e. 9.04), 
# unless there is a way I don't know of to extract this information.

This approach is inefficient when intervals become small in comparison to the total file length. Also, sample was stereo, but subsamp1 is mono, and I'd prefer not to change anything about the data if possible.

In the way of improving efficiency, I tried inputting vectors to the from and to arguments, but I got an error (see below). Even if it had worked, though, it would not be a particularly nice solution. Anyone know of a more elegant way to approach this problem using R?

cutw(subsamp1,from=seq(0,9,0.5),to=c(seq(0.5,9.0,0.5),9.04) 
# had to explicitly supply the max second (i.e. 9.04). 
# must be a better way to extract the maximum second

Error in wave[a:b, ] : subscript out of bounds
In addition: Warning messages:
1: In if (from > to) stop("'from' cannot be superior to 'to'") :
  the condition has length > 1 and only the first element will be used
2: In if (from == 0) { :
  the condition has length > 1 and only the first element will be used
3: In a:b : numerical expression has 19 elements: only the first used
like image 530
Jota Avatar asked Dec 20 '13 05:12

Jota


1 Answers

Building on the excellent answer by @Jean V. Adams, I found a solution using indexing (i.e. [).

library(seewave)

# your audio file (using example file from seewave package)
data(tico)
audio <- tico
# the frequency of your audio file
freq <- 22050
# the length and duration of your audio file
totlen <- length(audio)
totsec <- totlen/freq

# the duration that you want to chop the file into
seglen <- 0.5

# defining the break points
breaks <- unique(c(seq(0, totsec, seglen), totsec))
index <- 1:(length(breaks)-1)
# a list of all the segments
lapply(index, function(i) audio[(breaks[i]*freq):(breaks[i+1]*freq)])
# the above final line is the only difference between this code and the 
# code provided by @Jean V. Adams

The advantage here is that if your input audio object is stereo, the returned objects are stereo, as well. cutw changes output objects to mono, from what I can tell.

like image 169
Jota Avatar answered Oct 13 '22 11:10

Jota