Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

tm_map has parallel::mclapply error in R 3.0.1 on Mac

I am using R 3.0.1 on Platform: x86_64-apple-darwin10.8.0 (64-bit)

I am trying to use tm_map from the tm library. But when I execute the this code

library(tm)
data('crude')
tm_map(crude, stemDocument)

I get this error:

Warning message:
In parallel::mclapply(x, FUN, ...) :
  all scheduled cores encountered errors in user code

Does anyone know a solution for this?

like image 531
Dominik Avatar asked Aug 17 '13 10:08

Dominik


2 Answers

I suspect you don't have the SnowballC package installed, which seems to be required. tm_map is supposed to run stemDocument on all the documents using mclapply. Try just running the stemDocument function on one document, so you can extract the error:

stemDocument(crude[[1]])

For me, I got an error:

Error in loadNamespace(name) : there is no package called ‘SnowballC’

So I just went ahead and installed SnowballC and it worked. Clearly, SnowballC should be a dependency.

like image 136
nograpes Avatar answered Oct 04 '22 00:10

nograpes


I just ran into this. It took me a bit of digging but I found out what was happening.

  1. I had a line of code 'rdevel <- tm_map(rdevel, asPlainTextDocument)'

  2. Running this produced the error


    In parallel::mclapply(x, FUN, ...) :
      all scheduled cores encountered errors in user code

  1. It turns out that 'tm_map' calls some code in 'parallel' which attempts to figure out how many cores you have. To see what it's thinking, type

    > getOption("mc.cores", 2L)
    [1] 2
    >

  1. Aha moment! Tell the 'tm_map' call to only use one core!

    > rdevel <- tm_map(rdevel, asPlainTextDocument, mc.cores=1)
    Error in match.fun(FUN) : object 'asPlainTextDocument' not found
    > rdevel <- tm_map(rdevel, asPlainTextDocument, mc.cores=4)
    Warning message:
    In parallel::mclapply(x, FUN, ...) :
      all scheduled cores encountered errors in user code
    > 

So ... with more than one core, rather than give you the error message, 'parallel' just tells you there was an error in each core. Not helpful, parallel! I forgot the dot - the function name is supposed to be 'as.PlainTextDocument'!

So - if you get this error, add 'mc.cores=1' to the 'tm_map' call and run it again.

like image 20
znmeb Avatar answered Oct 04 '22 00:10

znmeb