How to convert a dataframe from long to wide, with values grouped by year in the index?

Tags:

The code below worked with the previous csv that I used, both csv's have the same amount of columns, and the columns have the same name.

Data for the csv that worked here

Data for csv that didnt here

What does this error mean? Why am I getting this error?

Click to copy

from pandas import read_csv
from pandas import DataFrame
from pandas import Grouper
from matplotlib import pyplot

series = read_csv('carringtonairtemp.csv', header=0, index_col=0, parse_dates=True, squeeze=True)

groups = series.groupby(Grouper(freq='A'))
years = DataFrame()

for name, group in groups:
    years[name.year] = group.values

years = years.T

pyplot.matshow(years, interpolation=None, aspect='auto')
pyplot.show()

Error

Click to copy

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-7173fcbe8c08> in <module>
      6 #     display(group.head())
      7 #     print(group.values[:10])
----> 8     years[name.year] = group.values

e:\Anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
   3038         else:
   3039             # set column
-> 3040             self._set_item(key, value)
   3041 
   3042     def _setitem_slice(self, key: slice, value):

e:\Anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
   3114         """
   3115         self._ensure_valid_index(value)
-> 3116         value = self._sanitize_column(key, value)
   3117         NDFrame._set_item(self, key, value)
   3118 

e:\Anaconda3\lib\site-packages\pandas\core\frame.py in _sanitize_column(self, key, value, broadcast)
   3759 
   3760             # turn me into an ndarray
-> 3761             value = sanitize_index(value, self.index)
   3762             if not isinstance(value, (np.ndarray, Index)):
   3763                 if isinstance(value, list) and len(value) > 0:

e:\Anaconda3\lib\site-packages\pandas\core\internals\construction.py in sanitize_index(data, index)
    745     """
    746     if len(data) != len(index):
--> 747         raise ValueError(
    748             "Length of values "
    749             f"({len(data)}) "

ValueError: Length of values (365) does not match length of index (252)

716

asked Sep 20 '20 05:09

Xavier Conzet

1 Answers

The issue with iteratively creating the dataframe in the manner shown, is it requires the new column to match the length of the existing dataframe, year, index.
In the smaller dataset, all the years are 365 days without missing days.
The larger dataset has mixed length years of 365 and 366 days and there is missing data from 1990 and 2020, which is causing ValueError: Length of values (365) does not match length of index (252).
Following is a more succinct script, which achieves the desired dataframe shape, and plot.
- This implementation doesn't have issues with the unequal data lengths.

Click to copy

import pandas as pd
import matplotlib.pyplot as plt

# links to data
url1 = 'https://raw.githubusercontent.com/trenton3983/stack_overflow/master/data/so_data/2020-09-19%20%2063975678/daily-min-temperatures.csv'
url2 = 'https://raw.githubusercontent.com/trenton3983/stack_overflow/master/data/so_data/2020-09-19%20%2063975678/carringtonairtemp.csv'

# load the data into a DataFrame, not a Series
# parse the dates, and set them as the index
df1 = pd.read_csv(url1, parse_dates=['Date'], index_col=['Date'])
df2 = pd.read_csv(url2, parse_dates=['Date'], index_col=['Date'])

# groupby year and aggregate Temp into a list
dfg1 = df1.groupby(df1.index.year).agg({'Temp': list})
dfg2 = df2.groupby(df2.index.year).agg({'Temp': list})

# create a wide format dataframe with all the temp data expanded
df1_wide = pd.DataFrame(dfg1.Temp.tolist(), index=dfg1.index)
df2_wide = pd.DataFrame(dfg2.Temp.tolist(), index=dfg2.index)

# plot
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(10, 10))

ax1.matshow(df1_wide, interpolation=None, aspect='auto')
ax2.matshow(df2_wide, interpolation=None, aspect='auto')

enter image description here

168

answered Nov 06 '22 03:11

Trenton McKinney

Related questions
                            
                                How can I copy DataFrames with datetimes from Stack Overflow into Python?
                            
                                Can't use Image.putalpha() on a png file from PIL lib. OSError: cannot write mode PA as PNG
                            
                                Write a readable test-case for a diff which includes "\n"
                            
                                Bot only takes one command
                            
                                Python 3.6 type hinting for a function accepting generic class type and instance type of the same generic type
                            
                                How do I make a circular tree with multiple root trees
                            
                                How to implement single sign-on django auth in azure ad?
                            
                                Shift "nan" to the beginning of an array in python [duplicate]
                            
                                To what extent does Google Colab support Python typing?
                            
                                Python Turtle Write Value in Containing Box
                            
                                What form of imports should I use in __main__.py and then how should I run the project?
                            
                                Keras loss and metrics values do not match with same function in each
                            
                                Fill Box Color in Box Plot
                            
                                ERROR: Unable to find py4j, your SPARK_HOME may not be configured correctly
                            
                                TypeError: required field "type_ignores" missing from Module
                            
                                Infinite scroll bar is not working with django
                            
                                Plotting networkx.Graph: how to change node position instead of resetting every node?
                            
                                What is the correct boilerplate for explicit relative imports?
                            
                                Python concurrent.futures Error in atexit._run_exitfuncs: OSError: handle is closed only running in Visual studio Debugging Mode
                            
                                Scrapy hidden memory leak

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to convert a dataframe from long to wide, with values grouped by year in the index?

Tags:

python

arrays

pandas

dataframe

matplotlib

Error

Xavier Conzet

People also ask

1 Answers

Trenton McKinney

Recent Activity

Donate For Us