Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pivot one column containing strings in a dataframe? [duplicate]

I am trying to reshape a pandas dataframe, by turning one of the columns in the data, into rows (by pivoting or unstacking).

I am new to this, so likely that I'm missing something obvious. I've searched extensively, but have not been able to successfully apply any solutions that I've come across.

df
    Location    Month       Metric       Value
0   Texas       January     Temperature  10
1   New York    January     Temperature  20
2   California  January     Temperature  30
3   Alaska      January     Temperature  40
4   Texas       January     Color        Red
5   New York    January     Color        Blue
6   California  January     Color        Green
7   Alaska      January     Color        Yellow
8   Texas       February    Temperature  15
9   New York    February    Temperature  25
10  California  February    Temperature  35
11  Alaska      February    Temperature  NaN
12  Texas       February    Color        NaN
13  New York    February    Color        Purple
14  California  February    Color        Orange
15  Alaska      February    Color        Brown

I am trying to "pivot" the Metric values into columns. End goal is a result like this:

Location    Month     Temperature   Color
Texas       January   10            Red
New York    January   20            Blue
California  January   30            Green
Alaska      January   40            Yellow
Texas       February  15    
New York    February  25            Purple
California  February  35            Orange
Alaska      February                Brown

I have tried using pivot, pivot_table, as well as unstack methods, but I'm sure I'm missing something. Many of the complications seem to come because I am mixing strings with numbers, and have some missing values in the data as well.

This is the closest I have been able to get so far, but I don't want extra rows for the month column, resulting in more blank values:

df.set_index(['Location','Month','Metric'], append=True, inplace=True)
df.unstack()

    Value
    Metric              Color   Temperature
    Location    Month       
0   Texas       January None    10
1   New York    January None    20
2   California  January None    30
3   Alaska      January None    40
4   Texas       January Red     None
5   New York    January Blue    None
6   California  January Green   None
7   Alaska      January Yellow  None

Any help here would be greatly appreciated. This seems like something that most likely has a simple solution available.

like image 736
brendxn Avatar asked Feb 28 '18 11:02

brendxn


People also ask

When two DataFrame column values are matched with each other it is called?

Combining DataFrames using a common field is called “joining”. The columns containing the common values are called “join key(s)”. Joining DataFrames in this way is often useful when one DataFrame is a “lookup table” containing additional data that we want to include in the other.

How can I pivot a DataFrame?

DataFrame - pivot() function The pivot() function is used to reshaped a given DataFrame organized by given index / column values. This function does not support data aggregation, multiple values will result in a MultiIndex in the columns. Column to use to make new frame's index. If None, uses existing index.

How do I know if a DataFrame contains duplicates?

To take a look at the duplication in the DataFrame as a whole, just call the duplicated() method on the DataFrame. It outputs True if an entire row is identical to a previous row.

How do I concatenate columns in a DataFrame?

By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.


1 Answers

A pivot solution to what you need. The output is semantics to what you want -

Metric                Color Temperature
Location   Month                       
Alaska     February   Brown         NaN
           January   Yellow          40
California February  Orange          35
           January    Green          30
New York   February  Purple          25
           January     Blue          20
Texas      February     NaN          15
           January      Red          10

Code -

df_p = df.pivot_table(index=['Location', 'Month'], columns=['Metric'], values='Value', aggfunc=np.sum)
like image 123
Vivek Kalyanarangan Avatar answered Oct 30 '22 23:10

Vivek Kalyanarangan