Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R can't write to csv or RData file

Tags:

r

csv

save

I'm trying to write the contents of a data frame to an RData file.

> save(collector2, file="collect2.RData")
Error in save(collector2, file = "collect2.RData") : 
  error writing to connection

As a 2nd option I tried using:

> write.csv(collector2, file="collect2.csv", row.names=FALSE)

This executes and creates a file, but it is empty.

Here's the data frame I'm trying to write:

> head(collector2)
          adQuer1     rowid adQueravg
1 2485651|2284211 132000001 0.0000000
2      20888541|7 132000002 0.0152358
3      20888541|7 132987430 0.0152358
4      20888541|7 132595958 0.0152358
5      20888541|7 132621111 0.0152358
6      20888541|7 132464740 0.0152358
> str(collector2)
'data.frame':   17639105 obs. of  3 variables:
 $ adQuer1  : Factor w/ 7241603 levels "1000467|130715",..: 430440 229948 229948 229948 229948 229948 229948 229948 229948 229948 ...
 $ rowid    : num  1.32e+08 1.32e+08 1.33e+08 1.33e+08 1.33e+08 ...
 $ adQueravg: num  0 0.0152 0.0152 0.0152 0.0152 ...

Here is my system info:

> version
               _                            
platform       x86_64-unknown-linux-gnu     
arch           x86_64                       
os             linux-gnu                    
system         x86_64, linux-gnu            
status                                      
major          2                            
minor          15.0                         
year           2012                         
month          03                           
day            30                           
svn rev        58871                        
language       R                            
version.string R version 2.15.0 (2012-03-30)
nickname                                    

Any suggestions?

like image 450
screechOwl Avatar asked May 21 '12 18:05

screechOwl


2 Answers

Turns out it was a hard drive issue. I was out of space and that was the message.

like image 65
screechOwl Avatar answered Sep 28 '22 14:09

screechOwl


Well the object you are trying to persist is not small.

In any event, I was not able to reproduce the error, but object size is the only possible source that i can see.

The middle column in your data frame, rowid, is type double, which are 64-bit integers, so that column is comprises the bulk of the object size. This suggests persisting this column separately from the other two--i.e., in two separate RData objects each persisted to its own file.

Second, perhaps try compression by passing in the appropriate arguments to save

With a data frame having approx. 300K rows and 9 columns, i reduced the size of the RData object by a little more than 1/2 using gzip compression.

a dataframe with 9 columns x approx. 300,000 rows

> dim(FG1)
[1] 282816      9

> dfile = "fg1.RData"


Creates an RData file 131 KB in size:

save(FG1, file=dfile)


Creates an RData file 66 KB in size:

save(FG1, file=dfile, compress=TRUE, compression_level=9)

like image 24
doug Avatar answered Sep 28 '22 16:09

doug