I am trying to process a text file. Overall I have a Corpus that I would like to analyze. In order to use the tm package (a text mining package in R) to create a Corpus object I need to make this paragraph to become one gigantic vector in order to be read properly.
I have a paragraph
Commercial exploitation over the past two hundred years drove
the great Mysticete whales to near extinction. Variation in
the sizes of populations prior to exploitation, minimal
population size during exploitation and current population
sizes permit analyses of the effects of differing levels of
exploitation on species with different biogeographical
distributions and life-history characteristics.
I've used both the scan and readLine methods and it processes the text like this:
[28] " commercial exploitation over the past two hundred years drove "
[29] " the great mysticete whales to near extinction variation in "
[30] " the sizes of populations prior to exploitation minimal "
Is there a way to get rid of the line breaks? Or to read the text file as one gigantic vector?
All of the solution posted have been great so far thank you.
This will read the entire file into a length one character vector.
x <- readChar(file, file.info(file)$size)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With