I am trying to reshape a pandas dataframe, by turning one of the columns in the data, into rows (by pivoting or unstacking).
I am new to this, so likely that I'm missing something obvious. I've searched extensively, but have not been able to successfully apply any solutions that I've come across.
df
Location Month Metric Value
0 Texas January Temperature 10
1 New York January Temperature 20
2 California January Temperature 30
3 Alaska January Temperature 40
4 Texas January Color Red
5 New York January Color Blue
6 California January Color Green
7 Alaska January Color Yellow
8 Texas February Temperature 15
9 New York February Temperature 25
10 California February Temperature 35
11 Alaska February Temperature NaN
12 Texas February Color NaN
13 New York February Color Purple
14 California February Color Orange
15 Alaska February Color Brown
I am trying to "pivot" the Metric values into columns. End goal is a result like this:
Location Month Temperature Color
Texas January 10 Red
New York January 20 Blue
California January 30 Green
Alaska January 40 Yellow
Texas February 15
New York February 25 Purple
California February 35 Orange
Alaska February Brown
I have tried using pivot, pivot_table, as well as unstack methods, but I'm sure I'm missing something. Many of the complications seem to come because I am mixing strings with numbers, and have some missing values in the data as well.
This is the closest I have been able to get so far, but I don't want extra rows for the month column, resulting in more blank values:
df.set_index(['Location','Month','Metric'], append=True, inplace=True)
df.unstack()
Value
Metric Color Temperature
Location Month
0 Texas January None 10
1 New York January None 20
2 California January None 30
3 Alaska January None 40
4 Texas January Red None
5 New York January Blue None
6 California January Green None
7 Alaska January Yellow None
Any help here would be greatly appreciated. This seems like something that most likely has a simple solution available.
Combining DataFrames using a common field is called “joining”. The columns containing the common values are called “join key(s)”. Joining DataFrames in this way is often useful when one DataFrame is a “lookup table” containing additional data that we want to include in the other.
DataFrame - pivot() function The pivot() function is used to reshaped a given DataFrame organized by given index / column values. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns. Column to use to make new frame's index. If None, uses existing index.
To take a look at the duplication in the DataFrame as a whole, just call the duplicated() method on the DataFrame. It outputs True if an entire row is identical to a previous row.
By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.
A pivot solution to what you need. The output is semantics to what you want -
Metric Color Temperature
Location Month
Alaska February Brown NaN
January Yellow 40
California February Orange 35
January Green 30
New York February Purple 25
January Blue 20
Texas February NaN 15
January Red 10
Code -
df_p = df.pivot_table(index=['Location', 'Month'], columns=['Metric'], values='Value', aggfunc=np.sum)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With