I am trying to work with the tm package in R, and have a CSV file of customer feedback with each line being a different instance of feedback. I want to import all the content of this feedback into a corpus but I want each line to be a different document within the corpus, so that I can compare the feedback in a DocTerms Matrix. There are over 10,000 rows in my data set. Originally I did the following: <pre class="prettyprint"><code>fdbk_corpus <-Corpus(VectorSource(fdbk), readerControl = list(language="eng"), sep="\t") </code></pre> This creates a corpus with 1 document and >10,000 rows, and I want >10,000 docs with 1 row each. I imagine I could just have 10,000+ separate CSV or TXT documents within a folder and create a corpus from that... but I'm thinking there is a much simpler answer than that, reading each line as a separate document.

Here's a complete workflow to get what you want: <pre class="prettyprint"><code># change this file location to suit your machine file_loc <- "C:\\Documents and Settings\\Administrator\\Desktop\\Book1.csv" # change TRUE to FALSE if you have no column headings in the CSV x <- read.csv(file_loc, header = TRUE) require(tm) corp <- Corpus(DataframeSource(x)) dtm <- DocumentTermMatrix(corp) </code></pre> In the <code>dtm</code> object each row will be a doc, or a line of your original CSV file. Each column will be a word.

R text mining documents from CSV file (one row per doc)

Q: How do I extract a column from a CSV file in R?

Just use dat <- read. csv("file. csv") and then select the column with dat$column , and you'll get a vector. The csv is, by definition, a text file with columns separated with commas and the same number of columns for all lines.

Q: How do I import a specific column into a CSV file in R?

Method 1: Using read. table() function. In this method of only importing the selected columns of the CSV file data, the user needs to call the read. table() function, which is an in-built function of R programming language, and then passes the selected column in its arguments to import particular columns from the data.

Tags:

r

text-mining

corpus

tm

documents

I am trying to work with the tm package in R, and have a CSV file of customer feedback with each line being a different instance of feedback. I want to import all the content of this feedback into a corpus but I want each line to be a different document within the corpus, so that I can compare the feedback in a DocTerms Matrix. There are over 10,000 rows in my data set.

Originally I did the following:

fdbk_corpus <-Corpus(VectorSource(fdbk), readerControl = list(language="eng"), sep="\t")

This creates a corpus with 1 document and >10,000 rows, and I want >10,000 docs with 1 row each.

I imagine I could just have 10,000+ separate CSV or TXT documents within a folder and create a corpus from that... but I'm thinking there is a much simpler answer than that, reading each line as a separate document.

381

asked Aug 01 '13 14:08

user2407054

1 Answers

Here's a complete workflow to get what you want:

# change this file location to suit your machine
file_loc <- "C:\\Documents and Settings\\Administrator\\Desktop\\Book1.csv"
# change TRUE to FALSE if you have no column headings in the CSV
x <- read.csv(file_loc, header = TRUE)
require(tm)
corp <- Corpus(DataframeSource(x))
dtm <- DocumentTermMatrix(corp)

In the dtm object each row will be a doc, or a line of your original CSV file. Each column will be a word.

answered Oct 17 '22 14:10

Ben

Related questions
                            
                                How to filter out NULL elements of tibble's list column
                            
                                Regex to remove leading zeros in R, unless the final (or only) character is zero
                            
                                Converting a deeply nested list to a dataframe
                            
                                How to do rolling sum over columns in R?
                            
                                breaking out of for loop when running a function inside a for loop in R
                            
                                Accessing Arbitrary Columns from an R Data Frame using with()
                            
                                How to perform basic Multiple Sequence Alignments in R?
                            
                                Import multiple text files in R and assign them names from a predetermined list
                            
                                parsing xml to list in R: how to consistently access nodes when xml structure varies?
                            
                                Centering title in R viewport with multiple graphs ggplot2
                            
                                Function to retain rows with >= 1 NA value (opposite of na.omit)
                            
                                write.csv() a list of unequally sized data.frames
                            
                                anova test fails on lme fits created with pasted formula
                            
                                What is the Matlab/Octave equivalent or R's 'merge' (or 'expand.grid')?
                            
                                How to use geom_point {ggplot2} to get points on the ends of the bars without getting circles in the legend?
                            
                                Loop in R: how to save the outputs?
                            
                                Is there any way to sort columns of a matrix independently in R?
                            
                                Convert string to single digits and sum
                            
                                Estimating many interaction terms in glmnet
                            
                                How to remove single quote from a string in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With