How to calculate total precipitation per day using hourly data for whole year?

Question

I have hourly data from ERA5 for each day in a specific year. I want to convert that data from hourly to daily. I know the long and hard way to do it, but I need something which does that easily.

Copernicus has a code for this here https://confluence.ecmwf.int/display/CKB/ERA5%3A+How+to+calculate+daily+total+precipitation, which works fine if the data set is only converted for one day, but when converting for the whole year, i am having problems with that.

Link to download ERA5 dataset which is available at https://cds.climate.copernicus.eu/cdsapp#!/home

Follow the steps to use copernicus server here

https://confluence.ecmwf.int/display/CKB/How+to+download+ERA5

This script downloads the houly data for only 2 days (1st and 2nd of January 2017):

#!/usr/bin/env python
"""
Save as get-tp.py, then run "python get-tp.py".
  
Input file : None
Output file: tp_20170101-20170102.nc
"""
import cdsapi
 
c = cdsapi.Client()
r = c.retrieve(
    'reanalysis-era5-single-levels', {
            'variable'    : 'total_precipitation',
            'product_type': 'reanalysis',
            'year'        : '2017',
            'month'       : '01',
            'day'         : ['01', '02'],
            'time'        : [
                '00:00','01:00','02:00',
                '03:00','04:00','05:00',
                '06:00','07:00','08:00',
                '09:00','10:00','11:00',
                '12:00','13:00','14:00',
                '15:00','16:00','17:00',
                '18:00','19:00','20:00',
                '21:00','22:00','23:00'
            ],
            'format'      : 'netcdf'
    })
r.download('tp_20170101-20170102.nc')
## Add multiple days and multiple months to donload more data

Below script will create a netCDF file for only one day

#!/usr/bin/env python
"""
Save as file calculate-daily-tp.py and run "python calculate-daily-tp.py".
  
Input file : tp_20170101-20170102.nc
Output file: daily-tp_20170101.nc
"""
import time, sys
from datetime import datetime, timedelta
 
from netCDF4 import Dataset, date2num, num2date
import numpy as np
 
day = 20170101
d = datetime.strptime(str(day), '%Y%m%d')
f_in = 'tp_%d-%s.nc' % (day, (d + timedelta(days = 1)).strftime('%Y%m%d'))
f_out = 'daily-tp_%d.nc' % day
 
time_needed = []
for i in range(1, 25):
    time_needed.append(d + timedelta(hours = i))
 
with Dataset(f_in) as ds_src:
    var_time = ds_src.variables['time']
    time_avail = num2date(var_time[:], var_time.units,
            calendar = var_time.calendar)
 
    indices = []
    for tm in time_needed:
        a = np.where(time_avail == tm)[0]
        if len(a) == 0:
            sys.stderr.write('Error: precipitation data is missing/incomplete - %s!
'
                    % tm.strftime('%Y%m%d %H:%M:%S'))
            sys.exit(200)
        else:
            print('Found %s' % tm.strftime('%Y%m%d %H:%M:%S'))
            indices.append(a[0])
 
    var_tp = ds_src.variables['tp']
    tp_values_set = False
    for idx in indices:
        if not tp_values_set:
            data = var_tp[idx, :, :]
            tp_values_set = True
        else:
            data += var_tp[idx, :, :]
         
    with Dataset(f_out, mode = 'w', format = 'NETCDF3_64BIT_OFFSET') as ds_dest:
        # Dimensions
        for name in ['latitude', 'longitude']:
            dim_src = ds_src.dimensions[name]
            ds_dest.createDimension(name, dim_src.size)
            var_src = ds_src.variables[name]
            var_dest = ds_dest.createVariable(name, var_src.datatype, (name,))
            var_dest[:] = var_src[:]
            var_dest.setncattr('units', var_src.units)
            var_dest.setncattr('long_name', var_src.long_name)
 
        ds_dest.createDimension('time', None)
        var = ds_dest.createVariable('time', np.int32, ('time',))
        time_units = 'hours since 1900-01-01 00:00:00'
        time_cal = 'gregorian'
        var[:] = date2num([d], units = time_units, calendar = time_cal)
        var.setncattr('units', time_units)
        var.setncattr('long_name', 'time')
        var.setncattr('calendar', time_cal)
 
        # Variables
        var = ds_dest.createVariable(var_tp.name, np.double, var_tp.dimensions)
        var[0, :, :] = data
        var.setncattr('units', var_tp.units)
        var.setncattr('long_name', var_tp.long_name)
 
        # Attributes
        ds_dest.setncattr('Conventions', 'CF-1.6')
        ds_dest.setncattr('history', '%s %s'
                % (datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
                ' '.join(time.tzname)))
 
        print('Done! Daily total precipitation saved in %s' % f_out)

What I want is a code which will follows the same step as above data but assuming that I have an input file with one year houly data and convert that to one year daily data.

The result should be daily values for the calculate variable (such as precipitation, etc) for the whole year.

Example: Let's say I have a precipitation data for the whole year as 1mm/hr every day, I would have 2928 values for the whole year.

What I want is 24mm/day for the whole year with only 365 values for a non-leap year.

Example input dataset: Subset of the data can be downloaded from here (for 1st and 2nd January 2017) https://www.dropbox.com/sh/0vdfn20p355st3i/AABKYO4do_raGHC34VnsXGPqa?dl=0. Just use the 2nd script after this to check the code. {the code for the whole year is >10 GB thus can't be uploaded

Thanks in advance

N1B4 · Accepted Answer

xarray resample is just the tool for you. It converts netCDF data from one temporal resolution (e.g. hourly) to another (e.g. daily) in one line. Using your sample data file, we can create daily-means using the following code:

import xarray as xr

ds = xr.open_dataset('./tp_20170101-20170102.nc')
tp = ds['tp'] # dimensions [time: 48, latitude: 721, longitude: 1440]
tp_daily = tp.resample(time='D').mean(dim='time') # dimensions (time: 2, latitude: 721, longitude: 1440)

You'll see that the resample command takes in a temporal code, in this case 'D' which means daily and then we specify that we want to compute the mean for each day using the hourly data of that day with .mean(dim='time').

If instead, for example, you wanted to compute the daily max rather than the daily mean, you'd replace .mean(dim='time') with .max(dim='time'). You can also go from hourly to monthly (MS or month-start), annual (AS or annual-start), and many more. The temporal frequency codes can be found in the Pandas docs.

How to calculate total precipitation per day using hourly data for whole year?

Tags:

python-3.x

dataset

analysis

cdo-climate

era5

Follow the steps to use copernicus server here

What I want is a code which will follows the same step as above data but assuming that I have an input file with one year houly data and convert that to one year daily data.

Ep1c1aN

1 Answers

N1B4

Recent Activity

Donate For Us

How to calculate total precipitation per day using hourly data for whole year?

Tags:

python-3.x

dataset

analysis

cdo-climate

era5

Follow the steps to use copernicus server here

What I want is a code which will follows the same step as above data but assuming that I have an input file with one year houly data and convert that to one year daily data.

Ep1c1aN

1 Answers

N1B4

Related questions

Recent Activity

Donate For Us