Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - unstack column values into new columns

I have a large dataframe and I am storing a lot of redundant values that are making it hard to handle my data. I have a dataframe of the form:

import pandas as pd

df = pd.DataFrame([["a","g","n1","y1"], ["a","g","n2","y2"], ["b","h","n1","y3"], ["b","h","n2","y4"]], columns=["meta1", "meta2", "name", "data"])

>>> df

  meta1 meta2 name data
    a     g   n1   y1
    a     g   n2   y2
    b     h   n1   y3
    b     h   n2   y4

where I have the names of the new columns I would like in name and the respective data in data.

I would like to produce a dataframe of the form:

df = pd.DataFrame([["a","g","y1","y2"], ["b","h","y3","y4"]], columns=["meta1", "meta2", "n1", "n2"])

>>> df

meta1 meta2  n1  n2
  a     g  y1  y2
  b     h  y3  y4

The columns called meta are around 15+ other columns that contain most of the data, and I don't think are particularly well suited to for indexing. The idea is that I have a lot of repeated/redundant data stored in meta at the moment and I would like to produce the more compact dataframe presented.

I have found some similar Qs but can't pinpoint what sort of operations I need to do: pivot, re-index, stack or unstack, etc.?

PS - the original index values are unimportant for my purposes.

Any help would be much appreciated.

Question I think is related:

I think the following Q is related to what I am trying to do, but I can't see how to apply it, as I don't want to produce more indexes.

  • Python Pandas- how to unstack a pivot table with two values with each value becoming a new column?
like image 384
oliversm Avatar asked Jun 15 '16 15:06

oliversm


People also ask

How do I separate column values in pandas?

split() Pandas provide a method to split string around a passed separator/delimiter. After that, the string can be stored as a list in a series or it can also be used to create multiple column data frames from a single separated string.

How do I split a column into multiple columns in pandas?

In Pandas, the apply() method can also be used to split one column values into multiple columns. The DataFrame. apply method() can execute a function on all values of single or multiple columns. Then inside that function, we can split the string value to multiple values.


1 Answers

If you group your meta columns into a list then you can do this:

metas = ['meta1', 'meta2']

new_df = df.set_index(['name'] + metas).unstack('name')
print new_df

            data    
name          n1  n2
meta1 meta2         
a     g       y1  y2
b     h       y3  y4

Which gets you most of the way there. Additional tailoring can get you the rest of the way.

print new_df.data.rename_axis([None], axis=1).reset_index()

  meta1 meta2  n1  n2
0     a     g  y1  y2
1     b     h  y3  y4
like image 94
piRSquared Avatar answered Oct 17 '22 23:10

piRSquared