I'm trying to run some R code and it is crashing because of memory. The error that I get is:
Error in sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE)) :
long vectors not supported yet: memory.c:3100
The function that creates the troubles is the following:
StationUserX <- function(userNDX){
lat1 = deg2rad(geolocation$latitude[userNDX])
long1 = deg2rad(geolocation$longitude[userNDX])
session_user_id = as.character(geolocation$session_user_id[userNDX])
#Find closest station
Distance2Stations <- unlist(lapply(stationNDXs, Distance2StationX, lat1, long1))
# Return index for closest station and distance to closest station
stations_userX = data.frame(session_user_id = session_user_id,
station = ghcndstations$ID[stationNDXs],
Distance2Station = Distance2Stations)
stations_userX = stations_userX[with(stations_userX, order(Distance2Station)), ]
stations_userX = stations_userX[1:100,] #only the 100 closest stations...
row.names(stations_userX)<-NULL
return(stations_userX)
}
I run this function using mclapply 50k times. StationUserX is calling Distance2StationX 90k times.
Is there an obvious way to optimize the function StationUserX ?
mclapply
is having trouble sending back all the data from worker threads into the main thread. That's because of prescheduling, where it runs large number of iterations per thread, and then syncs all the data back. That's nice and fast, but results in >2GB of data being sent back, which it can't do.
Run mclapply
with mc.preschedule=F
to turn off pre-scheduling. Now, each iteration will spawn its own thread and will return its own data. It won't go quite as fast, but it gets around the problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With