I have a DataFrame like the below one, with identifiers as a column on top of an existing dateindex.
pd.DataFrame(index = [pd.to_datetime('2021-01-01'), pd.to_datetime('2021-01-01'),pd.to_datetime('2021-01-02'),pd.to_datetime('2021-01-02'), pd.to_datetime('2021-01-03'),pd.to_datetime('2021-01-03')], columns=['id','A', 'B'], data=[['foo',1,5],['bar',8,12],['foo',7,1], ['bar',5,1], ['foo',4,3],['bar',7,1]])
Out[6]:
id A B
2021-01-01 foo 1 5
2021-01-01 bar 8 12
2021-01-02 foo 7 1
2021-01-02 bar 5 1
2021-01-03 foo 4 3
2021-01-03 bar 7 1
My goal is to create a new sub-dataframes for each of the columns (A and B) except id, with dateIndex as single Index, and id (foo, bar) as column names. The expected output is shown below:
A
Out[9]:
foo bar
2021-01-01 1 8
2021-01-02 7 5
2021-01-03 4 7
B
Out[11]:
foo bar
2021-01-01 5 12
2021-01-02 1 1
2021-01-03 3 1
In the above example, the data frame 'df' is split into 2 parts 'df1' and 'df2' on the basis of values of column 'Weight'. Method 2: Using Dataframe. groupby(). This method is used to split the data into groups based on some criteria.
Select the cell or column that contains the text you want to split. Select Data > Text to Columns. In the Convert Text to Columns Wizard, select Delimited > Next. Select the Delimiters for your data.
split() function is used to break up single column values into multiple columns based on a specified separator or delimiter. The Series. str. split() function is similar to the Python string split() method, but split() method works on the all Dataframe columns, whereas the Series.
The second method to divide two columns is using the div() method. It divides the columns elementwise. It accepts a scalar value, series, or dataframe as an argument for dividing with the axis. If the axis is 0 the division is done row-wise and if the axis is 1 then division is done column-wise.
A, B = map(df.set_index('id', append=True).unstack().get, ['A', 'B'])
print(A)
id bar foo
2021-01-01 8 1
2021-01-02 5 7
2021-01-03 7 4
print(B)
id bar foo
2021-01-01 12 5
2021-01-02 1 1
2021-01-03 1 3
This just simply:
out = df.set_index('id',append=True).unstack('id')
# if you have columns other than `A`,`B`:
# out = df.set_index('id',append=True)[['A','B']].unstack('id')
then you can do
out['A']
which gives:
id bar foo
2021-01-01 8 1
2021-01-02 5 7
2021-01-03 7 4
and similarly for out['B']
. I found this is much easier and less error prone than hard-coding the variables to A,B
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With