Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Saving and loading data.frames [duplicate]

Tags:

dataframe

r

save

I have made a dataframe based on a set of twitters in the following form:

 rdmTweets <- userTimeline("rdatamining", n=200)
 df <- do.call("rbind", lapply(rdmTweets, as.data.frame))

Now I am saving the data frame with save in this way:

 save(df, file="data")

How I can load that saved data frame for future use? When I use:

  df2 <- load("data")

and I apply dim(df2) it should return the quantity of tweets that data frame has, but it only shows 1.

like image 665
Layla Avatar asked Nov 03 '12 07:11

Layla


3 Answers

As @mrdwab points out, save saves the names as well as the data/structure (and in fact can save a number of different R objects in a single file). There is another pair of storage functions that behave more as you expect. Try this:

saveRDS(df, file="mytweets.rds")
df2 <- readRDS("mytweets.rds")

These functions can only handle a single object at a time.

like image 66
seancarmody Avatar answered Oct 30 '22 04:10

seancarmody


Another option is to save your data frame as a csv file. The benefit of this option is that it provides long term storage, i.e. you will (likely) be able to open your csv file on any platform in ten years time. With an RData file, you can only open it with R and I wouldn't like to bet money on opening it between versions.

To save the file as a csv, just use: read.csv and write.csv, so:

write.csv(df, file="out.csv", row.name=FALSE)
df = read.csv("out.csv", header=TRUE)

Gavin's comment below raised a couple of points:

The CSV route only works for tabular-style data.

Completely correct. But if you are saving a data frame (as the OP is), then your data is in tabular form.

With R you'll always have the ability to fire up an old version to read the data and export if for some reason they change save format and don't allow the old format to be loaded by another function.

To play devil's adovacate, you could use this argument with Excel and save your data as an xls. However, saving your data in a csv format means we never need to worry about this.

R's file format is documented so one could reasonably easily read the binary data in another system using that open info.

I completely agree - although "easily" is a bit strong. This is why saving as an RData file isn't such a big deal. But if you are saving tabular data, why not use a csv file?

For the record, there are some reasons for saving tabular data as an RData file. For example, the speed in reading/writing the file or file size.

like image 37
csgillespie Avatar answered Oct 30 '22 03:10

csgillespie


save saves the name of the dataset as well as the data. Thus, you should not not assign a name to load("data") and you should be fine. In other words, simply use:

load("data")

and it will load an object named df (or whatever is contained in the file "data") into your current workspace.

I would suggest a more original name for your file though, and consider adding an extension to help you remember what your script files are, your data files are, and so on.


Work your way through this simple example:

rm(list = ls())              # Remove everything from your current workspace
ls()                         # Anything there? Nope.
# character(0)
a <- 1:10                    # Create an object "a"
save(a, file="myData.Rdata") # Save object "a"
ls()                         # Anything there? Yep.
# [1] "a"
rm(a)                        # Remove "a" from your workspace
ls()                         # Anything there? Nope.
# character(0)
load("myData.Rdata")         # Load your "myData.Rdata" file
ls()                         # Anything there? Yep. Object "a".
# [1] "a"
str(a)                       # Is "a" what we expect it to be? Yep.
#  int [1:10] 1 2 3 4 5 6 7 8 9 10
a2 <- load("myData.Rdata")   # What about your approach?
ls()                         # Now we have 2 objects
# [1] "a"  "a2"
str(a2)                      # "a2" stores the object names from your data file.
#  chr "a"

As you can see, save allows you to save and load multiple objects at once, which can be convenient when working on projects with multiple sets of data that you want to keep together.

On the other hand, saveRDS (from the accepted answer) only lets you save single objects. In some ways, this is more "transparent" since load() doesn't let you preview the contents of the file without first loading it.

like image 10
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 30 '22 03:10

A5C1D2H2I1M1N2O1R2T1