G'day, I am working with a large dataset with ~125,000 lon/lat locations with date, for species presence/absence records. For at each location I want to work out what the weather was like at each location on the date and during the 3mths prior to the date. To do this I have downloaded daily weather data for a given weather variable (e.g., max temperature) during the 5yr period the data was taken. I have a total of 1,826 raster files, all between 2-3mb. I had planned to stack all raster files, then extract a value from every raster (1,826) for each point. This would produce a massive file I could use to search for the dates I need. This is, however, not possible because I can't stack that many rasters. I tried splitting the rasters into stacks of 500, this works, but the files it produces are about 1Gb and very slow (rows, 125,000; columns, 500). Also, when I try to bring all of these files into R to create a big data frame it doesn't work. I would like to know if there is a way to work with this amount of data in R, or if there is a package that I could use to help. Could I use a package like ff? Does anyone have any suggestions for a less power intensive method to do what I want to do? I have thought about something like a lapply function, but have never used one before and am not really sure where to begin. Any help would be really great, thanks in advance for your time. The code I am currently using without success is below. Kind regards, Adam <pre class="prettyprint"><code>library(raster) library(rgdal) library (maptools) library(shapefiles) # To create weather data files, first set the working directory to the appropriate location (i.e., maxt) # list of raster weather files files<- list.files(getwd(), pattern='asc') length(files) memory.size(4000) memory.limit(4000) # read in lon/lat data X<-read.table(file.choose(), header=TRUE, sep=',') SP<- SpatialPoints(cbind(X$lon, X$lat)) #separate stacks into mannageable sizes s1<- stack(files[1:500]) i1 <- extract( s1,SP, cellnumbers = True, layer = 1, nl = 500) write.table(i1, file="maxt_vals_all_points_all_dates_1.csv", sep=",", row.names= FALSE, col.names= TRUE) rm(s1,i1) s2<- stack(files[501:1000]) i2 <- extract( s2,SP, cellnumbers = True, layer = 1, nl = 500) write.table(i2, file="maxt_vals_all_points_all_dates_2.csv", sep=",", row.names= FALSE, col.names= TRUE) rm(s2,i2) s3<- stack(files[1001:1500]) i3 <- extract( s3,SP, cellnumbers = True, layer = 1, nl = 500) write.table(i3, file="maxt_vals_all_points_all_dates_3.csv", sep=",", row.names= FALSE, col.names= TRUE) rm(s3,i3) s4<- stack(files[1501:1826]) i4 <- extract( s4,SP, cellnumbers = True, layer = 1, nl =325) write.table(i4, file="maxt_vals_all_points_all_dates_4.csv", sep=",", row.names= FALSE, col.names= TRUE) rm(s4,i4) # read files back in to bind into final file !!! NOT WORKING FILES ARE TOO BIG!! i1<-read.table(file.choose(),header=TRUE,sep=',') i2<-read.table(file.choose(),header=TRUE,sep=',') i3<-read.table(file.choose(),header=TRUE,sep=',') i4<-read.table(file.choose(),header=TRUE,sep=',') vals<-data.frame(X, i1, i2, i3 ,i4) write.table(vals, file="maxt_master_lookup.csv", sep=",", row.names= FALSE, col.names= TRUE) </code></pre>

I would do the extract one raster file at a time, and append the results to file as you go. I cheat making a list of matrices, but since raster can take a filename or a matrix (amongst other things) and you can index with "[[" on a character vector it should work pretty much the same in your case. <pre class="prettyprint"><code>files <- list(volcano, volcano * 2, volcano * 3) library(sp) SP <- SpatialPoints(structure(c(0.455921585146703, 0.237608166502031, 0.397704673508124, 0.678393354622703, 0.342820219769366, 0.554888036966903, 0.777351335399613, 0.654684656824567), .Dim = c(4L, 2L))) library(raster) for (i in seq_len(length(files))) { r <- raster(files[[i]]) e <- extract(r, SP) ## print(e) ## print for debugging write.table(data.frame(file = i, extract = e),"cellSummary.csv", col.names = i == 1, append = i > 1, sep = ",", row.names = FALSE) } </code></pre>

Working with lots of data and lots of rasters in R?

Tags:

memory-management

r

large-files

spatial

raster

G'day, I am working with a large dataset with ~125,000 lon/lat locations with date, for species presence/absence records. For at each location I want to work out what the weather was like at each location on the date and during the 3mths prior to the date. To do this I have downloaded daily weather data for a given weather variable (e.g., max temperature) during the 5yr period the data was taken. I have a total of 1,826 raster files, all between 2-3mb.

I had planned to stack all raster files, then extract a value from every raster (1,826) for each point. This would produce a massive file I could use to search for the dates I need. This is, however, not possible because I can't stack that many rasters. I tried splitting the rasters into stacks of 500, this works, but the files it produces are about 1Gb and very slow (rows, 125,000; columns, 500). Also, when I try to bring all of these files into R to create a big data frame it doesn't work.

I would like to know if there is a way to work with this amount of data in R, or if there is a package that I could use to help. Could I use a package like ff? Does anyone have any suggestions for a less power intensive method to do what I want to do? I have thought about something like a lapply function, but have never used one before and am not really sure where to begin.

Any help would be really great, thanks in advance for your time. The code I am currently using without success is below.

Kind regards, Adam

library(raster)
library(rgdal)
library (maptools)
library(shapefiles)

# To create weather data files, first set the working directory to the appropriate location (i.e., maxt)
# list of raster weather files
files<- list.files(getwd(), pattern='asc')
length(files)

memory.size(4000)  
memory.limit(4000)

# read in lon/lat data
X<-read.table(file.choose(), header=TRUE, sep=',')
SP<- SpatialPoints(cbind(X$lon, X$lat)) 

#separate stacks into mannageable sizes
s1<- stack(files[1:500])
i1 <- extract( s1,SP, cellnumbers = True, layer = 1, nl = 500)
write.table(i1, file="maxt_vals_all_points_all_dates_1.csv", sep=",", row.names= FALSE, col.names= TRUE)
rm(s1,i1)
s2<- stack(files[501:1000])
i2 <- extract( s2,SP, cellnumbers = True, layer = 1, nl = 500)
write.table(i2, file="maxt_vals_all_points_all_dates_2.csv", sep=",", row.names= FALSE, col.names= TRUE)
rm(s2,i2)
s3<- stack(files[1001:1500])
i3 <- extract( s3,SP, cellnumbers = True, layer = 1, nl = 500)
write.table(i3, file="maxt_vals_all_points_all_dates_3.csv", sep=",", row.names= FALSE, col.names= TRUE)
rm(s3,i3)
s4<- stack(files[1501:1826])
i4 <- extract( s4,SP, cellnumbers = True, layer = 1, nl =325)
write.table(i4, file="maxt_vals_all_points_all_dates_4.csv", sep=",", row.names= FALSE, col.names= TRUE)
rm(s4,i4)

# read files back in to bind into final file !!! NOT WORKING FILES ARE TOO BIG!!
i1<-read.table(file.choose(),header=TRUE,sep=',')
i2<-read.table(file.choose(),header=TRUE,sep=',')
i3<-read.table(file.choose(),header=TRUE,sep=',')
i4<-read.table(file.choose(),header=TRUE,sep=',')

vals<-data.frame(X, i1, i2, i3 ,i4)
write.table(vals, file="maxt_master_lookup.csv", sep=",", row.names= FALSE, col.names= TRUE)

221

asked Apr 07 '12 00:04

Adam

2 Answers

I would do the extract one raster file at a time, and append the results to file as you go.

I cheat making a list of matrices, but since raster can take a filename or a matrix (amongst other things) and you can index with "[[" on a character vector it should work pretty much the same in your case.

files <- list(volcano, volcano * 2, volcano * 3)
library(sp)
SP <- SpatialPoints(structure(c(0.455921585146703, 0.237608166502031, 0.397704673508124, 0.678393354622703, 0.342820219769366, 0.554888036966903, 0.777351335399613, 0.654684656824567), .Dim = c(4L, 2L)))

library(raster)
for (i in seq_len(length(files))) {

    r <- raster(files[[i]])
    e <- extract(r, SP)
    ## print(e)  ## print for debugging
    write.table(data.frame(file = i, extract = e),"cellSummary.csv", col.names = i == 1, append = i > 1, sep = ",", row.names = FALSE)
}

answered Oct 16 '22 08:10

mdsumner

I am using parallel processing and a form of cropping based on cell number. This function will take any spatial points or polygon and return the values from a large raster stack. This is a variant on code good example for large polygons.

For my data this takes about 350 seconds using extract, or 32 seconds on a 16 core linux server. Hope it helps someone!

 # Define Functions
  extract_value_point_polygon = function(point_or_polygon, raster_stack, num_workers){
          # Returns list containing values from locations of spatial points or polygons
          lapply(c('raster','foreach','doParallel'), require, character.only = T)
          registerDoParallel(num_workers)
          ply_result = foreach(j = 1:length(point_or_polygon),.inorder=T) %do%{
                print(paste('Working on feature: ',j,' out of ',length(point_or_polygon)))
                get_class= class(point_or_polygon)[1]
                if(get_class=='SpatialPolygons'|get_class=='SpatialPolygonsDataFrame'){
                    cell = as.numeric(na.omit(cellFromPolygon(raster_stack, point_or_polygon[j], weights=F)[[1]]))}
                if(get_class=='SpatialPointsDataFrame'|get_class=='SpatialPoints'){
                    cell = as.numeric(na.omit(cellFromXY(raster_stack, point_or_polygon[j,])))}
                if(length(cell)==0)return(NA)
                r = rasterFromCells(raster_stack, cell,values=F)
                result = foreach(i = 1:dim(raster_stack)[3],.packages='raster',.inorder=T) %dopar% {
                   crop(raster_stack[[i]],r)
                }
                result=as.data.frame(getValues(stack(result)))
                return(result)
          }
          endCluster()
          return(ply_result)
  }

answered Oct 16 '22 08:10

mmann1123

Related questions
                            
                                How I can load a R script into JRI and execute from Java?
                            
                                Porting existing C++ code to R
                            
                                Rgooglemaps plotting of text
                            
                                Subset a matrix using a column from another matrix in R
                            
                                R - Filter Data from a data frame
                            
                                Join two dataframes before exporting as .csv files
                            
                                zip file error in reading in an https url
                            
                                Alternatives to stats::reshape
                            
                                Disperse points in a 2D visualisation
                            
                                how to add legend to scatterplot3d in R
                            
                                Error bar ticks |--o--| don't draw for more then three conditions in dotplot
                            
                                ggplot2: Use %+% to plot new data
                            
                                Installing ggbiplot from github
                            
                                internal NA time series, zoo, R
                            
                                How to avoid <<- by using assign
                            
                                Simple conversion to edgelist with R?
                            
                                Next with Revolution R's foreach package?
                            
                                How to include all possible two-way interaction terms in a linear model in R?
                            
                                Running cor() (or any variant) over a sparse matrix in R
                            
                                Trouble installing rgdal

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With