How to extract the data in a netcdf file based on a shapefile in R?

Tags:

I have a netcdf file (global domain) which I do not know its projection information, and I want to extract the data (with lon and lat) based on the shapefile.

The objective is to find data for the domain represented by the shapefile ( the original netcdf contains global data). Besides, the final data format should be a data frame, which contains lon, lat, and data. I assume that the extract and mask function will be useful.

The netcdf and shapefile can be found at https://www.dropbox.com/s/ju96eitzzd0xxg8/precip.2000.nc?dl=0 and https://www.dropbox.com/s/8wfgf8207dbh79r/gpr_000b11a_e.zip?dl=0. Thanks for any help.

library(rgdal)
library(ncdf4)
library(raster)

shp = readOGR("gpr_000b11a_e.shp")
pre1 = nc_open("precip.2000.nc")
pre1

File precip.2000.nc (NC_FORMAT_NETCDF4_CLASSIC):
 1 variables (excluding dimension variables):
    float precip[lon,lat,time]   
        missing_value: -9.96920996838687e+36
        var_desc: Precipitation
        level_desc: Surface
        statistic: Total
        parent_stat: Other
        long_name: Daily total of precipitation
        cell_methods: time: sum
        avg_period: 0000-00-01 00:00:00
        actual_range: 0
         actual_range: 482.555358886719
        units: mm
        valid_range: 0
         valid_range: 1000
        dataset: CPC Global Precipitation

 3 dimensions:
    lat  Size:360
        actual_range: 89.75
         actual_range: -89.75
        long_name: Latitude
        units: degrees_north
        axis: Y
        standard_name: latitude
        coordinate_defines: center
    lon  Size:720
        long_name: Longitude
        units: degrees_east
        axis: X
        standard_name: longitude
        actual_range: 0.25
         actual_range: 359.75
        coordinate_defines: center
    time  Size:366   *** is unlimited ***
        long_name: Time
        axis: T
        standard_name: time
        coordinate_defines: start
        actual_range: 876576
         actual_range: 885336
        delta_t: 0000-00-01 00:00:00
        avg_period: 0000-00-01 00:00:00
        units: hours since 1900-01-01 00:00:00

7 global attributes:
    Conventions: CF-1.0
    version: V1.0
    history: created 9/2016 by CAS NOAA/ESRL PSD
    title: CPC GLOBAL PRCP V1.0
    References: https://www.esrl.noaa.gov/psd/data/gridded/data.cpc.globalprecip.html
    dataset_title: CPC GLOBAL PRCP V1.0
    Source: ftp://ftp.cpc.ncep.noaa.gov/precip/CPC_UNI_PRCP/

As we can see, there is no information about the projection or crs for the netcdf file.

809

asked May 06 '18 21:05

Yang Yang

1 Answers

You need to open the NetCDF file as a raster* object to process as a raster it in R. Use brick or stack for this, rather than nc_open

pre1.brick = brick("precip.2000.nc")

You will notice that this file uses the normal convention in climate science of giving longitudes in values ranging from 0 to 360 degrees:

extent(pre1.brick)
# class       : Extent 
# xmin        : 0 
# xmax        : 360 
# ymin        : -90 
# ymax        : 90

We can convert to conventional -180 to 180 longitudes usint rotate

pre1.brick = rotate(pre1.brick)

Now lets see what the projections of our raster and polygon files are:

crs(shp)
# +proj=longlat +datum=NAD83 +no_defs +ellps=GRS80 +towgs84=0,0,0 
crs(pre1.brick)
# +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0

You can see that both use lat-long coordinates, but differ in terms of datum and ellipse. It is usually recommended for accuracy and speed, when possible, to project the vector data rather than the raster to get them both in the same coordinate system:

shp = spTransform(shp, crs(pre1.brick))

Having projected both to the same coord system, we can apply the shapefile as a mask to the raster brick:

pre1.mask = mask(pre1.brick, shp)

And convert to a data.frame. Here I just demonstrate for the first layer. You can do all layers at once if you prefer, by removing [[1]] in the following line

pre1.df = as.data.frame(pre1.mask[[1]], xy=TRUE)

If you want, you can remove lines that have NA, to only leave cells inside the mask containing data:

pre1.df = pre1.df[complete.cases(pre1.df),]
head(pre1.df)
#            x     y X2000.01.01.00.00.00
# 10278 -81.25 82.75            0.2647110
# 10279 -80.75 82.75            0.2721323
# 10280 -80.25 82.75            0.2797630
# 10281 -79.75 82.75            0.2875668
# 10282 -79.25 82.75            0.2925712
# 10283 -78.75 82.75            0.2995636

105

answered Sep 28 '22 16:09

dww

Related questions
                            
                                is R creating too many threads on startup
                            
                                ggpubr: change font size of stat_compare_means Kruskal-Wallis p-values
                            
                                Skipping last column in r with read.csv
                            
                                generating random x and y coordinates with a minimum distance
                            
                                Text repel with a position argument in ggplot/R
                            
                                How to work with formula objects in R
                            
                                Plot feature importance computed by Ranger function
                            
                                How to make ggplot automatically extract the number of groups for the palette
                            
                                How to set X axis range on plotly graphs?
                            
                                How to replace words between two punctuations
                            
                                Coord_flip() changes ordering of bars within groups in grouped bar plot
                            
                                xts split by week function specify first day of week as Sunday instead of default of Monday
                            
                                Heat map when the color is scaled rowwise
                            
                                Producing a moving time sequence in raster
                            
                                Ordinary differential equations (ODEs) - Is there any way to prevent negative values?
                            
                                Capitalizing text in R with an exception list
                            
                                Access R Dataframe Values Rather than Tibble
                            
                                multiplicative distance between graph nodes
                            
                                Spacing vector by regular pattern
                            
                                Keeping raster variable names when converting to NetCDF using R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to extract the data in a netcdf file based on a shapefile in R?

Tags:

r

extract

mask

spatial

Yang Yang

People also ask

1 Answers

dww

Recent Activity

Donate For Us