I have a DataFrame and want to get divisions of pairs of columns like below:
df = pd.DataFrame({
'a1': np.random.randint(1, 1000, 1000),
'a2': np.random.randint(1, 1000, 1000),
'b1': np.random.randint(1, 1000, 1000),
'b2': np.random.randint(1, 1000, 1000),
'c1': np.random.randint(1, 1000, 1000),
'c2': np.random.randint(1, 1000, 1000),
})
df['a'] = df['a2'] / df['a1']
df['b'] = df['b2'] / df['b1']
df['c'] = df['c2'] / df['c1']
I want to combine the last three lines into one like:
df[['a', 'b', 'c']] = df[['a2', 'b2', 'c2']] / df[['a1', 'b1', 'c1']]
but I only get an error of ValueError: Columns must be same length as key
. If I just simply print(df[['a2', 'b2', 'c2']] / df[['a1', 'b1', 'c1']])
, I will only get a DataFrame with NaN
s of shape (1000, 6).
==== Edit
Now I know why my original one-line code doesn't work. Actually, the arithmetic operations of two DataFrames will be conducted between the columns with same labels, while those columns without same label in another DataFrame will generate NaNs. The result DataFrame will have the union() of the columns of the two operating DataFrames. That's why my original solution will give an ValueError
and the div will generate NaN
s.
Following example will be helpful to explain:
df1 = pd.DataFrame(data={'A':[1,2], 'B':[3,4], 'C':[5,6], 'D':[8,9]})
df2 = pd.DataFrame(data={'A':[11,12], 'B':[13,14], 'C':[15,16], 'D':[18,19]})
df1[['A', 'B']] / df2[['A', 'B']]
Out[130]:
A B
0 0.090909 0.230769
1 0.166667 0.285714
df1[['A', 'B']] / df2[['C', 'D']]
Out[131]:
A B C D
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
df1[['A', 'B']] + df2[['A', 'C']]
Out[132]:
A B C
0 12 NaN NaN
1 14 NaN NaN
Method 2: Pandas divide two columns using div() function It divides the columns elementwise. It accepts a scalar value, series, or dataframe as an argument for dividing with the axis. If the axis is 0 the division is done row-wise and if the axis is 1 then division is done column-wise.
DataFrame elements can be divided by a pandas series or by a Python sequence as well. Calling div() on a DataFrame instance is equivalent to invoking the division operator (/). The div() method provides the fill_value parameter which is used for replacing the np.
Example 1: The simple division (/) operator is the first way to divide two columns. You will split the First Column with the other columns here. This is the simplest method of dividing two columns in Pandas. We will import Pandas and take at least two columns while declaring the variables.
A Dataframe is a two-dimensional data structure, like data is aligned in a tabular fashion in rows and columns. DataFrame.sample () Method can be used to divide the Dataframe. Syntax: DataFrame.sample (n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)
First,import the pandas. Next, take a dictionary and convert into dataframe and store in df. Then, write the command df.Actor.str.split (expand=True). This means that the column ‘Actor‘ is split into 2 columns on the basis of space and then print. Here, the you can see in the output that ‘ Actor ‘ column has been splitted and printed separately.
In the above example, the data frame ‘df’ is split into 2 parts ‘df1’ and ‘df2’ on the basis of values of column ‘ Age ‘. In the above example, the data frame ‘df’ is split into 2 parts ‘df1’ and ‘df2’ on the basis of values of column ‘ Weight ‘.
Moreover, Dataframe is a mutable and heterogenous Pandas object that has three key elements: rows, columns, and data. Dataframe’s job is to present the raw dataset in a more clean and structured form to apply Pandas operations.
You can use:
df[['a', 'b', 'c']] = df[['a2', 'b2', 'c2']].values / df[['a1', 'b1', 'c1']].values
OUTPUT
a1 a2 b1 b2 c1 c2 a b c
0 864 214 551 761 174 111 0.247685 1.381125 0.637931
1 820 971 379 79 190 587 1.184146 0.208443 3.089474
2 305 154 519 378 567 186 0.504918 0.728324 0.328042
3 51 505 303 417 959 326 9.901961 1.376238 0.339937
4 84 531 625 899 248 905 6.321429 1.438400 3.649194
.. ... ... ... ... ... ... ... ... ...
995 302 695 790 777 896 975 2.301325 0.983544 1.088170
996 24 308 462 316 388 784 12.833333 0.683983 2.020619
997 135 286 359 752 282 283 2.118519 2.094708 1.003546
998 695 45 832 936 811 404 0.064748 1.125000 0.498150
999 809 454 971 335 366 884 0.561187 0.345005 2.415301
You could do the following as simple workaround:
df['a'], df['b'], df['c'] = (df['a2'] / df['a1'], df['b2'] / df['b1'], df['c2'] / df['c1'])
Although I think that using the assign
method would make your code much more readable:
df.assign(a=lambda x: x['a2'] / x['a1'],
b=lambda x: x['b2'] / x['b1'],
c=lambda x: x['c2'] / x['c1'])
MultiIndex comes in handy here, since Pandas is always aligned first on the index(columns is an index as well) before any computations.
Using sample data below:
df = pd.DataFrame({'a_1':range(2,10,2),
'a_2': range(4, 20, 4),
'b_1': range(3, 15,3),
'b_2': range(6,30,6),
'c_1': range(5, 25, 5),
'c_2': range(10, 50, 10)})
df
a_1 a_2 b_1 b_2 c_1 c_2
0 2 4 3 6 5 10
1 4 8 6 12 10 20
2 6 12 9 18 15 30
3 8 16 12 24 20 40
split the columns into a MultiIndex:
temp = df.copy()
temp.columns = temp.columns.str.split('_', expand = True).swaplevel()
temp
1 2 1 2 1 2
a a b b c c
0 2 4 3 6 5 10
1 4 8 6 12 10 20
2 6 12 9 18 15 30
3 8 16 12 24 20 40
In this form, you can simply select 2
divided by 1
:
df['a','b','c']] = temp['2'] / temp['1']
This gives the same values as :
df[['a_2', 'b_2', 'c_2']].values / df[['a_1', 'b_1', 'c_1']].values
Imagine however, that you have lots of columns, you do not need to worry about the pairing, as the MultiIndex form takes care of that, with Pandas doing the alignment before computation.
Numpy is going to be faster - @MuhammadHassan's answer fits nicely; this is just to show how MultiIndex has its place and its uses in the right circumstances.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With