Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split/Expand Dataframe based on column values

Tags:

python

pandas

I have a DataFrame like the below one, with identifiers as a column on top of an existing dateindex.

pd.DataFrame(index = [pd.to_datetime('2021-01-01'), pd.to_datetime('2021-01-01'),pd.to_datetime('2021-01-02'),pd.to_datetime('2021-01-02'), pd.to_datetime('2021-01-03'),pd.to_datetime('2021-01-03')], columns=['id','A', 'B'], data=[['foo',1,5],['bar',8,12],['foo',7,1], ['bar',5,1], ['foo',4,3],['bar',7,1]])

Out[6]: 
             id  A   B
2021-01-01  foo  1   5
2021-01-01  bar  8  12
2021-01-02  foo  7   1
2021-01-02  bar  5   1
2021-01-03  foo  4   3
2021-01-03  bar  7   1

My goal is to create a new sub-dataframes for each of the columns (A and B) except id, with dateIndex as single Index, and id (foo, bar) as column names. The expected output is shown below:

A
Out[9]: 
            foo  bar
2021-01-01    1    8
2021-01-02    7    5
2021-01-03    4    7

B
Out[11]: 
            foo  bar
2021-01-01    5   12
2021-01-02    1    1
2021-01-03    3    1
like image 598
ylnor Avatar asked Mar 17 '21 15:03

ylnor


People also ask

How do you split a Dataframe based on column values in Python?

In the above example, the data frame 'df' is split into 2 parts 'df1' and 'df2' on the basis of values of column 'Weight'. Method 2: Using Dataframe. groupby(). This method is used to split the data into groups based on some criteria.

How do I split a text column into two separate columns?

Select the cell or column that contains the text you want to split. Select Data > Text to Columns. In the Convert Text to Columns Wizard, select Delimited > Next. Select the Delimiters for your data.

How do you split data in one column into multiple columns in Python?

split() function is used to break up single column values into multiple columns based on a specified separator or delimiter. The Series. str. split() function is similar to the Python string split() method, but split() method works on the all Dataframe columns, whereas the Series.

How do you split a column by another column in a Dataframe?

The second method to divide two columns is using the div() method. It divides the columns elementwise. It accepts a scalar value, series, or dataframe as an argument for dividing with the axis. If the axis is 0 the division is done row-wise and if the axis is 1 then division is done column-wise.


2 Answers

A, B = map(df.set_index('id', append=True).unstack().get, ['A', 'B'])

print(A)

id          bar  foo
2021-01-01    8    1
2021-01-02    5    7
2021-01-03    7    4

print(B)

id          bar  foo
2021-01-01   12    5
2021-01-02    1    1
2021-01-03    1    3
like image 119
piRSquared Avatar answered Oct 11 '22 22:10

piRSquared


This just simply:

out = df.set_index('id',append=True).unstack('id')
# if you have columns other than `A`,`B`:
# out = df.set_index('id',append=True)[['A','B']].unstack('id')

then you can do

out['A']

which gives:

id          bar  foo
2021-01-01    8    1
2021-01-02    5    7
2021-01-03    7    4

and similarly for out['B']. I found this is much easier and less error prone than hard-coding the variables to A,B.

like image 45
Quang Hoang Avatar answered Oct 11 '22 21:10

Quang Hoang