How to use zoo or xts with large data?

Tags:

How can I use the R packages zoo or xts with very large data sets? (100GB) I know there are some packages such as bigrf, ff, bigmemory that can deal with this problem but you have to use their limited set of commands, they don't have the functions of zoo or xts and I don't know how to make zoo or xts to use them. How can I use it?

I've seen that there are also some other things, related with databases, such as sqldf and hadoopstreaming, RHadoop, or some other used by Revolution R. What do you advise?, any other?

I just want to aggreagate series, cleanse, and perform some cointegrations and plots. I wouldn't like to need to code and implement new functions for every command I need, using small pieces of data every time.

Added: I'm on Windows

472

asked Mar 27 '13 00:03

skan

1 Answers

I have had a similar problem (albeit I was only playing with 9-10 GBs). My experience is that there is no way R can handle so much data on its own, especially since your dataset appears to contain time series data.

If your dataset contains a lot of zeros, you may be able to handle it using sparse matrices - see Matrix package ( http://cran.r-project.org/web/packages/Matrix/index.html ); this manual may also come handy ( http://www.johnmyleswhite.com/notebook/2011/10/31/using-sparse-matrices-in-r/ )

I used PostgreSQL - the relevant R package is RPostgreSQL ( http://cran.r-project.org/web/packages/RPostgreSQL/index.html ). It allows you to query your PostgreSQL database; it uses SQL syntax. Data is downloaded into R as a dataframe. It may be slow (depending on the complexity of your query), but it is robust and can be handy for data aggregation.

Drawback: you would need to upload data into the database first. Your raw data needs to be clean and saved in some readable format (txt/csv). This is likely to be the biggest issue if your data is not already in a sensible format. Yet uploading "well-behaved" data into the DB is easy ( see http://www.postgresql.org/docs/8.2/static/sql-copy.html and How to import CSV file data into a PostgreSQL table? )

I would recommend using PostgreSQL or any other relational database for your task. I did not try Hadoop, but using CouchDB nearly drove me round the bend. Stick with good old SQL

127

answered Oct 11 '22 23:10

Skif

Related questions
                            
                                multi-faceted heat map with ggplot for selected portion of X with additional text labels on it
                            
                                Changing background color of inferior-ess-mode in Emacs
                            
                                Panel data regression: Robust standard errors
                            
                                Multiple plots with multiple densities in ggplot2
                            
                                Data Frame Subset Performance
                            
                                Highlight data individually with facet_grid in R
                            
                                R levelplot and interpolation [closed]
                            
                                Is it possible to set a default location for R packages instead of giving the user a choice?
                            
                                R - predict command error "undefined columns selected"
                            
                                How to avoid ESS underscore automatic subsitution with '<-' when pasting text but keep it when writing an underscore
                            
                                Using the Stanford NLP libraries from within R, using the rJava package
                            
                                Between/within standard deviations
                            
                                I want to create a countour map from x, y and z and clip the data with the shapefile using ggplot
                            
                                inheritance of aesthetics in ggplot2 0.9.3 & the behavior of annotation_custom
                            
                                How can I get Rtools working on Windows with R 2.15.2? [closed]
                            
                                R simplify heatmap to pdf
                            
                                using git and curl command line
                            
                                How can I retain the initial white space in a line when writing Rd documentation?
                            
                                Documentation for special variables in ggplot (..count.., ..density.., etc.)
                            
                                `as.matrix` and `as.data.frame` S3 methods vs. S4 methods

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use zoo or xts with large data?

Tags:

r

time-series

zoo

xts

skan

People also ask

1 Answers

Skif

Recent Activity

Donate For Us