The size of my .Rdata file is 92 MB.
However, the original csv-file is around 3 GB. I included it with the usual read.csv()
How can that be?
A CSV file will often be larger than the XLSX it was created from. This is because in XLSX is a actually a compressed (zipped) file - you can unzip it with a standard compression tool and check it out for yourself. You will see smaller XLSX files if there is a lot of repeat data.
csv files have a limit of 32,767 characters per cell. Excel has a limit of 1,048,576 rows and 16,384 columns per sheet. CSV files can hold many more rows. You can read more about these limits and others from this Microsoft support article here.
The Difficulty with Opening Big CSVs in Excel Excel is limited to opening CSVs that fit within your computer's RAM. For most modern computers, that means a limit of about 60,000 to 200,000 rows.
The comments already hinted at what is going on. But this is so straightforward, let us do an example:
R> X <- 1:1e5 # data, no repeats
R> save(X, file="/tmp/foo.RData")
R> write.csv(X, file="/tmp/foo.csv")
R> system("ls -l /tmp/foo*")
-rw-r--r-- 1 x y 1377797 Jun 4 09:11 /tmp/foo.csv
-rw-r--r-- 1 x y 212397 Jun 4 09:11 /tmp/foo.RData
Now with data that repeats:
R> X <- rep(1,1e5) # data, lots of repeats
R> write.csv(X, file="/tmp/bar.csv")
R> save(X, file="/tmp/bar.RData")
R> system("ls -lh /tmp/bar*")
-rw-r--r-- 1 x y 966K Jun 4 09:12 /tmp/bar.csv
-rw-r--r-- 1 x y 1.3K Jun 4 09:12 /tmp/bar.RData
R>
So we are getting ratios of 6.5 to 743 depending on how well this compresses. And that is before we make the csv more "expensive" by forcing several decimals to be printed...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With