Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Size of Rdata file compared to csv

Tags:

r

csv

rdata

The size of my .Rdata file is 92 MB.

However, the original csv-file is around 3 GB. I included it with the usual read.csv()

How can that be?

like image 894
Rico Avatar asked Jun 04 '13 14:06

Rico


People also ask

Are CSV files smaller in size?

A CSV file will often be larger than the XLSX it was created from. This is because in XLSX is a actually a compressed (zipped) file - you can unzip it with a standard compression tool and check it out for yourself. You will see smaller XLSX files if there is a lot of repeat data.

How big is a CSV file?

csv files have a limit of 32,767 characters per cell. Excel has a limit of 1,048,576 rows and 16,384 columns per sheet. CSV files can hold many more rows. You can read more about these limits and others from this Microsoft support article here.

How big is too big for a CSV file?

The Difficulty with Opening Big CSVs in Excel Excel is limited to opening CSVs that fit within your computer's RAM. For most modern computers, that means a limit of about 60,000 to 200,000 rows.


1 Answers

The comments already hinted at what is going on. But this is so straightforward, let us do an example:

R> X <- 1:1e5   # data, no repeats
R> save(X, file="/tmp/foo.RData")
R> write.csv(X, file="/tmp/foo.csv")
R> system("ls -l /tmp/foo*")
-rw-r--r-- 1 x y 1377797 Jun  4 09:11 /tmp/foo.csv
-rw-r--r-- 1 x y  212397 Jun  4 09:11 /tmp/foo.RData

Now with data that repeats:

R> X <- rep(1,1e5)   # data, lots of repeats
R> write.csv(X, file="/tmp/bar.csv")
R> save(X, file="/tmp/bar.RData")
R> system("ls -lh /tmp/bar*")
-rw-r--r-- 1 x y 966K Jun  4 09:12 /tmp/bar.csv
-rw-r--r-- 1 x y 1.3K Jun  4 09:12 /tmp/bar.RData
R> 

So we are getting ratios of 6.5 to 743 depending on how well this compresses. And that is before we make the csv more "expensive" by forcing several decimals to be printed...

like image 166
Dirk Eddelbuettel Avatar answered Oct 12 '22 22:10

Dirk Eddelbuettel