Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transforming yearwise data using pandas

Tags:

python

pandas

I have a dataframe that looks like this:

            Temp
Date    
1981-01-01  20.7
1981-01-02  17.9
1981-01-03  18.8
1981-01-04  14.6
1981-01-05  15.8
...     ...
1981-12-27  15.5
1981-12-28  13.3
1981-12-29  15.6
1981-12-30  15.2
1981-12-31  17.4

365 rows × 1 columns

And I want to transform It so That It looks like:

        1981    1982    1983    1984    1985    1986    1987    1988    1989    1990
0       20.7    17.0    18.4    19.5    13.3    12.9    12.3    15.3    14.3    14.8
1       17.9    15.0    15.0    17.1    15.2    13.8    13.8    14.3    17.4    13.3
2       18.8    13.5    10.9    17.1    13.1    10.6    15.3    13.5    18.5    15.6
3       14.6    15.2    11.4    12.0    12.7    12.6    15.6    15.0    16.8    14.5
4       15.8    13.0    14.8    11.0    14.6    13.7    16.2    13.6    11.5    14.3
...     ...     ...     ...     ...     ...     ...     ...     ...     ...     ...
360     15.5    15.3    13.9    12.2    11.5    14.6    16.2    9.5     13.3    14.0
361     13.3    16.3    11.1    12.0    10.8    14.2    14.2    12.9    11.7    13.6
362     15.6    15.8    16.1    12.6    12.0    13.2    14.3    12.9    10.4    13.5
363     15.2    17.7    20.4    16.0    16.3    11.7    13.3    14.8    14.4    15.7
364     17.4    16.3    18.0    16.4    14.4    17.2    16.7    14.1    12.7    13.0

My attempt:

groups=df.groupby(df.index.year)
keys=groups.groups.keys()
years=pd.DataFrame()
for key in keys:
    years[key]=groups.get_group(key)['Temp'].values

Question: The above code is giving me my desired output but Is there is a more efficient way of transforming this?

As I can't post the whole data because there are 3650 rows in the dataframe so you can download the csv file(60.6 kb) for testing from here

like image 621
Super Ultra Noob Avatar asked Jun 11 '26 06:06

Super Ultra Noob


1 Answers

Try grabbing the year and dayofyear from the index then pivoting:

import pandas as pd
import numpy as np

# Create Random Data
dr = pd.date_range(pd.to_datetime("1981-01-01"), pd.to_datetime("1982-12-31"))
df = pd.DataFrame(np.random.randint(1, 100, size=dr.shape),
                  index=dr,
                  columns=['Temp'])

# Get Year and Day of Year
df['year'] = df.index.year
df['day'] = df.index.dayofyear

# Pivot
p = df.pivot(index='day', columns='year', values='Temp')

print(p)

p:

year  1981  1982
day             
1       38    85
2       51    70
3       76    61
4       71    47
5       44    76
..     ...   ...
361     23    22
362     42    64
363     84    22
364     26    56
365     67    73

Run-Time via Timeit

import timeit

setup = '''
import pandas as pd
import numpy as np

# Create Random Data
dr = pd.date_range(pd.to_datetime("1981-01-01"), pd.to_datetime("1983-12-31"))
df = pd.DataFrame(np.random.randint(1, 100, size=dr.shape),
                  index=dr,
                  columns=['Temp'])'''

pivot = '''
df['year'] = df.index.year
df['day'] = df.index.dayofyear
p = df.pivot(index='day', columns='year', values='Temp')'''

groupby_for = '''
groups=df.groupby(df.index.year)
keys=groups.groups.keys()
years=pd.DataFrame()
for key in keys:
    years[key]=groups.get_group(key)['Temp'].values'''

if __name__ == '__main__':
    print("Pivot")
    print(timeit.timeit(setup=setup, stmt=pivot, number=1000))
    print("Groupby For")
    print(timeit.timeit(setup=setup, stmt=groupby_for, number=1000))
Pivot
1.598973
Groupby For
2.3967995999999996

*Additional note, the groupby for option will not work for leap years as it will not be able to handle 1984 being 366 days instead of 365. Pivot will work regardless.

like image 70
Henry Ecker Avatar answered Jun 16 '26 13:06

Henry Ecker



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!