The size of my .Rdata file is 92 MB. However, the original csv-file is around 3 GB. I included it with the usual <code>read.csv()</code> How can that be?

The comments already hinted at what is going on. But this is so straightforward, let us do an example: <pre class="prettyprint"><code>R> X <- 1:1e5 # data, no repeats R> save(X, file="/tmp/foo.RData") R> write.csv(X, file="/tmp/foo.csv") R> system("ls -l /tmp/foo*") -rw-r--r-- 1 x y 1377797 Jun 4 09:11 /tmp/foo.csv -rw-r--r-- 1 x y 212397 Jun 4 09:11 /tmp/foo.RData </code></pre> Now with data that repeats: <pre class="prettyprint"><code>R> X <- rep(1,1e5) # data, lots of repeats R> write.csv(X, file="/tmp/bar.csv") R> save(X, file="/tmp/bar.RData") R> system("ls -lh /tmp/bar*") -rw-r--r-- 1 x y 966K Jun 4 09:12 /tmp/bar.csv -rw-r--r-- 1 x y 1.3K Jun 4 09:12 /tmp/bar.RData R> </code></pre> So we are getting ratios of 6.5 to 743 depending on how well this compresses. And that is before we make the csv more "expensive" by forcing several decimals to be printed...

Size of Rdata file compared to csv

1 Answers

The comments already hinted at what is going on. But this is so straightforward, let us do an example:

R> X <- 1:1e5   # data, no repeats
R> save(X, file="/tmp/foo.RData")
R> write.csv(X, file="/tmp/foo.csv")
R> system("ls -l /tmp/foo*")
-rw-r--r-- 1 x y 1377797 Jun  4 09:11 /tmp/foo.csv
-rw-r--r-- 1 x y  212397 Jun  4 09:11 /tmp/foo.RData

Now with data that repeats:

R> X <- rep(1,1e5)   # data, lots of repeats
R> write.csv(X, file="/tmp/bar.csv")
R> save(X, file="/tmp/bar.RData")
R> system("ls -lh /tmp/bar*")
-rw-r--r-- 1 x y 966K Jun  4 09:12 /tmp/bar.csv
-rw-r--r-- 1 x y 1.3K Jun  4 09:12 /tmp/bar.RData
R>

So we are getting ratios of 6.5 to 743 depending on how well this compresses. And that is before we make the csv more "expensive" by forcing several decimals to be printed...

166

answered Oct 12 '22 22:10

Dirk Eddelbuettel

Related questions
                            
                                How to access a bash environment variable from within R in emacs-ess
                            
                                How to convert NAD 83 coordinates to latitude and longitude with rgdal package?
                            
                                Testing rules generated by Rpart package
                            
                                time zones in POSIXct and xts, converting from GMT in R
                            
                                profile confidence intervals in R: mle2
                            
                                How to search for equal variables in rows (in a smart way) and store according rows as subsets?
                            
                                rbind data.frames without names
                            
                                how to get rJava 0.9-3 to work on OS X 10.7.4 with Oracle Java 1.7?
                            
                                install.packages errors: Troubleshooting local repo usage
                            
                                Append new data to an existing dataframe (RDS) in R
                            
                                how to preserve multi-byte characters after parse()
                            
                                How to separate the two leftmost bins of a histogram in R
                            
                                print.xtable with multi-line header?
                            
                                What is the default text size in R base graphics?
                            
                                Trying to apply color gradient on histogram in ggplot
                            
                                Derive Age where DOB = Feb 29th
                            
                                R memory management - increasing memory consumption
                            
                                pdf of figure in R has unsightly white lines in it
                            
                                Is approx() designed to be used with complex numbers?
                            
                                Vectorizing order in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Size of Rdata file compared to csv

Tags:

r

csv

rdata

Rico

People also ask

1 Answers

Dirk Eddelbuettel

Recent Activity

Donate For Us