Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert a dataframe from long to wide, with values grouped by year in the index?

The code below worked with the previous csv that I used, both csv's have the same amount of columns, and the columns have the same name.

Data for the csv that worked here

Data for csv that didnt here

What does this error mean? Why am I getting this error?

from pandas import read_csv
from pandas import DataFrame
from pandas import Grouper
from matplotlib import pyplot

series = read_csv('carringtonairtemp.csv', header=0, index_col=0, parse_dates=True, squeeze=True)

groups = series.groupby(Grouper(freq='A'))
years = DataFrame()

for name, group in groups:
    years[name.year] = group.values

years = years.T

pyplot.matshow(years, interpolation=None, aspect='auto')
pyplot.show()

Error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-7173fcbe8c08> in <module>
      6 #     display(group.head())
      7 #     print(group.values[:10])
----> 8     years[name.year] = group.values

e:\Anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
   3038         else:
   3039             # set column
-> 3040             self._set_item(key, value)
   3041 
   3042     def _setitem_slice(self, key: slice, value):

e:\Anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
   3114         """
   3115         self._ensure_valid_index(value)
-> 3116         value = self._sanitize_column(key, value)
   3117         NDFrame._set_item(self, key, value)
   3118 

e:\Anaconda3\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
   3759 
   3760             # turn me into an ndarray
-> 3761             value = sanitize_index(value, self.index)
   3762             if not isinstance(value, (np.ndarray, Index)):
   3763                 if isinstance(value, list) and len(value) > 0:

e:\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in sanitize_index(data, index)
    745     """
    746     if len(data) != len(index):
--> 747         raise ValueError(
    748             "Length of values "
    749             f"({len(data)}) "

ValueError: Length of values (365) does not match length of index (252)
like image 716
Xavier Conzet Avatar asked Sep 20 '20 05:09

Xavier Conzet


People also ask

How do you make a long data frame wider?

To reshape a dataframe from wide to long, we can use Pandas' pd. melt() method. pd. melt(df, id_vars=, value_vars=, var_name=, value_name=, ignore_index=)

How do you convert a column of a DataFrame into its index?

To set a column as index for a DataFrame, use DataFrame. set_index() function, with the column name passed as argument. You can also setup MultiIndex with multiple columns in the index. In this case, pass the array of column names required for index, to set_index() method.

How do you convert a data frame to a series?

To convert the last or specific column of the Pandas dataframe to series, use the integer-location-based index in the df. iloc[:,0] . For example, we want to convert the third or last column of the given data from Pandas dataframe to series. In this case, the following code example will help us.


1 Answers

  • The issue with iteratively creating the dataframe in the manner shown, is it requires the new column to match the length of the existing dataframe, year, index.
  • In the smaller dataset, all the years are 365 days without missing days.
  • The larger dataset has mixed length years of 365 and 366 days and there is missing data from 1990 and 2020, which is causing ValueError: Length of values (365) does not match length of index (252).
  • Following is a more succinct script, which achieves the desired dataframe shape, and plot.
    • This implementation doesn't have issues with the unequal data lengths.
import pandas as pd
import matplotlib.pyplot as plt

# links to data
url1 = 'https://raw.githubusercontent.com/trenton3983/stack_overflow/master/data/so_data/2020-09-19%20%2063975678/daily-min-temperatures.csv'
url2 = 'https://raw.githubusercontent.com/trenton3983/stack_overflow/master/data/so_data/2020-09-19%20%2063975678/carringtonairtemp.csv'

# load the data into a DataFrame, not a Series
# parse the dates, and set them as the index
df1 = pd.read_csv(url1, parse_dates=['Date'], index_col=['Date'])
df2 = pd.read_csv(url2, parse_dates=['Date'], index_col=['Date'])

# groupby year and aggregate Temp into a list
dfg1 = df1.groupby(df1.index.year).agg({'Temp': list})
dfg2 = df2.groupby(df2.index.year).agg({'Temp': list})

# create a wide format dataframe with all the temp data expanded
df1_wide = pd.DataFrame(dfg1.Temp.tolist(), index=dfg1.index)
df2_wide = pd.DataFrame(dfg2.Temp.tolist(), index=dfg2.index)

# plot
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(10, 10))

ax1.matshow(df1_wide, interpolation=None, aspect='auto')
ax2.matshow(df2_wide, interpolation=None, aspect='auto')

enter image description here

like image 168
Trenton McKinney Avatar answered Nov 06 '22 03:11

Trenton McKinney