Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error writing to csv

Tags:

r

I'm attempting to write a data frame to csv but it seems to be complaining because the columns contain lists.

I want to be able to do is access this data frame and call it into R at a later time. I don't care how to accomplish this (save as a text file etc). This is a fairly large data set n=182305. Any ideas to write it to a file that I can fairly quickly read into R (I'm not married to a csv file)

DATA Frame & the Code I've Tried

DF2<-structure(list(word = c("3-D", "4-F", "4-H'er", "4-H", "A battery", 
"a bon march"), pos.code = c("AN", "N", "N", "A", "h", "v"), 
    pos = list(c("A", "N"), "N", "N", "A", "h", "v"), noun = list(
        TRUE, TRUE, TRUE, FALSE, FALSE, FALSE), plural = list(
        FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), noun.phrase = list(
        FALSE, FALSE, FALSE, FALSE, TRUE, FALSE), verb.usually.participle = list(
        FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), transitive.verb = list(
        FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), intransitive.verb = list(
        FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), adjective = list(
        TRUE, FALSE, FALSE, TRUE, FALSE, FALSE), adverb = list(
        FALSE, FALSE, FALSE, FALSE, FALSE, TRUE), conjunction = list(
        FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), preposition = list(
        FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), interjection = list(
        FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), pronoun = list(
        FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), definite.article = list(
        FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), indefinite.article = list(
        FALSE, FALSE, FALSE, FALSE, FALSE, FALSE), nominative = list(
        FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)), .Names = c("word", 
"pos.code", "pos", "noun", "plural", "noun.phrase", "verb.usually.participle", 
"transitive.verb", "intransitive.verb", "adjective", "adverb", 
"conjunction", "preposition", "interjection", "pronoun", "definite.article", 
"indefinite.article", "nominative"), row.names = c(NA, 6L), class = "data.frame")

write.table(DF2, file = "mobyPOS.csv", sep = " ", col.names = TRUE,qmethod = "double")

Error message I got:

> write.table(DF2, file = "mobyPOS.csv", sep = " ", col.names = TRUE,qmethod = "double")
Error in write.table(x, file, nrow(x), p, rnames, sep, eol, na, dec, as.integer(quote),  : 
  unimplemented type 'list' in 'EncodeElement'
like image 813
Tyler Rinker Avatar asked Nov 29 '11 04:11

Tyler Rinker


2 Answers

This is just meant to address the issue of lists as columns in data frames mentioned in the comments.

In the specific instance of your example data, the only place where the lists are "required" is the first element in DF2$pos, which is a vector of length two. This can be removed with the following code:

DF2$pos[[1]] <- paste(DF2$pos[[1]],collapse = "")
newDF <- as.data.frame(lapply(DF2,unlist))

Generally, the metaphor of a data frame is that the rows correspond to cases, or observational units, and the columns correspond to variables. Further, this metaphor holds that a particular observational unit has only one value for each variable. In this sense, it's the same as a matrix, only it can store columns of different classes.

Obviously, R allows you to break that metaphor, as you've discovered. The question of whether it is a good idea to do so will be domain and data specific. Not every data set fits perfectly into the data frame metaphor; sometimes you'll have a variable where the "values" you measure don't easily collapse into a single expression.

You will have a choice to make: in your case, using newDF instead may require the use of string parsing (strsplit, etc) each time you access that value. That may be awkward at times, and it may not fit perfectly with your mental model of your data.

On the other hand, much of R is built around things being stored in data frames in ways that adhere to the data frame metaphor. As you discovered with write.csv, if you don't adhere to those expectations, some pieces (indeed, many pieces) of R won't behave the way you expect. This will also require extra work and awkwardness.

In my experience, it's usually better to sacrifice the purity of your preconceived idea of how your data should be structured and instead do your best to fit it into a data frame somehow. At least, that route has involved less work arounds for me. But nothing is ever perfect.

But as I said at the beginning, this will be extremely data and domain specific. YMMV.

like image 93
joran Avatar answered Sep 18 '22 02:09

joran


Try

save(DF2, file = "mobyPOS.Rdata")

Note that you don't have to use the extension "Rdata", but it or "RData" seem to be the convention.

You can then load the data back in using

load("mobyPOS.Rdata")

Note that this is different from reading an external file format where you would normally do something like

your_object <- read.csv(...)

With the load command, it loads the object directly so that after you execute the load command, your DF2 object will be there.

like image 42
Xu Wang Avatar answered Sep 18 '22 02:09

Xu Wang