Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R text mining documents from CSV file (one row per doc)

I am trying to work with the tm package in R, and have a CSV file of customer feedback with each line being a different instance of feedback. I want to import all the content of this feedback into a corpus but I want each line to be a different document within the corpus, so that I can compare the feedback in a DocTerms Matrix. There are over 10,000 rows in my data set.

Originally I did the following:

fdbk_corpus <-Corpus(VectorSource(fdbk), readerControl = list(language="eng"), sep="\t")

This creates a corpus with 1 document and >10,000 rows, and I want >10,000 docs with 1 row each.

I imagine I could just have 10,000+ separate CSV or TXT documents within a folder and create a corpus from that... but I'm thinking there is a much simpler answer than that, reading each line as a separate document.

like image 381
user2407054 Avatar asked Aug 01 '13 14:08

user2407054


People also ask

How does one read in a CSV file of data into R?

Reading a CSV file The CSV file to be read should be either present in the current working directory or the directory should be set accordingly using the setwd(…) command in R. The CSV file can also be read from a URL using read. csv() function.

How do I read multiple CSV files in R?

In order to read multiple CSV files or all files from a folder in R, use data. table package. data. table is a third-party library hence, in order to use data.

How do I extract a column from a CSV file in R?

Just use dat <- read. csv("file. csv") and then select the column with dat$column , and you'll get a vector. The csv is, by definition, a text file with columns separated with commas and the same number of columns for all lines.

How do I import a specific column into a CSV file in R?

Method 1: Using read. table() function. In this method of only importing the selected columns of the CSV file data, the user needs to call the read. table() function, which is an in-built function of R programming language, and then passes the selected column in its arguments to import particular columns from the data.


1 Answers

Here's a complete workflow to get what you want:

# change this file location to suit your machine
file_loc <- "C:\\Documents and Settings\\Administrator\\Desktop\\Book1.csv"
# change TRUE to FALSE if you have no column headings in the CSV
x <- read.csv(file_loc, header = TRUE)
require(tm)
corp <- Corpus(DataframeSource(x))
dtm <- DocumentTermMatrix(corp)

In the dtm object each row will be a doc, or a line of your original CSV file. Each column will be a word.

like image 60
Ben Avatar answered Oct 17 '22 14:10

Ben