Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DataFrame pairs of columns division

I have a DataFrame and want to get divisions of pairs of columns like below:

df = pd.DataFrame({
    'a1': np.random.randint(1, 1000, 1000),
    'a2': np.random.randint(1, 1000, 1000),
    'b1': np.random.randint(1, 1000, 1000),
    'b2': np.random.randint(1, 1000, 1000),
    'c1': np.random.randint(1, 1000, 1000),
    'c2': np.random.randint(1, 1000, 1000),
})
df['a'] = df['a2'] / df['a1']
df['b'] = df['b2'] / df['b1']
df['c'] = df['c2'] / df['c1']

I want to combine the last three lines into one like:

df[['a', 'b', 'c']] = df[['a2', 'b2', 'c2']] / df[['a1', 'b1', 'c1']]

but I only get an error of ValueError: Columns must be same length as key. If I just simply print(df[['a2', 'b2', 'c2']] / df[['a1', 'b1', 'c1']]), I will only get a DataFrame with NaNs of shape (1000, 6).

==== Edit

Now I know why my original one-line code doesn't work. Actually, the arithmetic operations of two DataFrames will be conducted between the columns with same labels, while those columns without same label in another DataFrame will generate NaNs. The result DataFrame will have the union() of the columns of the two operating DataFrames. That's why my original solution will give an ValueError and the div will generate NaNs.

Following example will be helpful to explain:

df1 = pd.DataFrame(data={'A':[1,2], 'B':[3,4], 'C':[5,6], 'D':[8,9]})
df2 = pd.DataFrame(data={'A':[11,12], 'B':[13,14], 'C':[15,16], 'D':[18,19]})
df1[['A', 'B']] / df2[['A', 'B']]
Out[130]: 
          A         B
0  0.090909  0.230769
1  0.166667  0.285714

df1[['A', 'B']] / df2[['C', 'D']]
Out[131]: 
    A   B   C   D
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN

df1[['A', 'B']] + df2[['A', 'C']]
Out[132]: 
    A   B   C
0  12 NaN NaN
1  14 NaN NaN
like image 374
Cuteufo Avatar asked Dec 20 '21 10:12

Cuteufo


People also ask

How do I divide multiple columns in pandas?

Method 2: Pandas divide two columns using div() function It divides the columns elementwise. It accepts a scalar value, series, or dataframe as an argument for dividing with the axis. If the axis is 0 the division is done row-wise and if the axis is 1 then division is done column-wise.

How do you divide data frames?

DataFrame elements can be divided by a pandas series or by a Python sequence as well. Calling div() on a DataFrame instance is equivalent to invoking the division operator (/). The div() method provides the fill_value parameter which is used for replacing the np.

How do you create a new column by dividing two columns in pandas?

Example 1: The simple division (/) operator is the first way to divide two columns. You will split the First Column with the other columns here. This is the simplest method of dividing two columns in Pandas. We will import Pandas and take at least two columns while declaring the variables.

How to divide a Dataframe in Python?

A Dataframe is a two-dimensional data structure, like data is aligned in a tabular fashion in rows and columns. DataFrame.sample () Method can be used to divide the Dataframe. Syntax: DataFrame.sample (n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)

How to split actor column in pandas Dataframe?

First,import the pandas. Next, take a dictionary and convert into dataframe and store in df. Then, write the command df.Actor.str.split (expand=True). This means that the column ‘Actor‘ is split into 2 columns on the basis of space and then print. Here, the you can see in the output that ‘ Actor ‘ column has been splitted and printed separately.

What are the two parts of the data frame ‘DF’?

In the above example, the data frame ‘df’ is split into 2 parts ‘df1’ and ‘df2’ on the basis of values of column ‘ Age ‘. In the above example, the data frame ‘df’ is split into 2 parts ‘df1’ and ‘df2’ on the basis of values of column ‘ Weight ‘.

What is Dataframe in pandas?

Moreover, Dataframe is a mutable and heterogenous Pandas object that has three key elements: rows, columns, and data. Dataframe’s job is to present the raw dataset in a more clean and structured form to apply Pandas operations.


Video Answer


3 Answers

You can use:

df[['a', 'b', 'c']] = df[['a2', 'b2', 'c2']].values / df[['a1', 'b1', 'c1']].values

OUTPUT

      a1   a2   b1   b2   c1   c2          a         b         c
0    864  214  551  761  174  111   0.247685  1.381125  0.637931
1    820  971  379   79  190  587   1.184146  0.208443  3.089474
2    305  154  519  378  567  186   0.504918  0.728324  0.328042
3     51  505  303  417  959  326   9.901961  1.376238  0.339937
4     84  531  625  899  248  905   6.321429  1.438400  3.649194
..   ...  ...  ...  ...  ...  ...        ...       ...       ...
995  302  695  790  777  896  975   2.301325  0.983544  1.088170
996   24  308  462  316  388  784  12.833333  0.683983  2.020619
997  135  286  359  752  282  283   2.118519  2.094708  1.003546
998  695   45  832  936  811  404   0.064748  1.125000  0.498150
999  809  454  971  335  366  884   0.561187  0.345005  2.415301
like image 142
Muhammad Hassan Avatar answered Oct 16 '22 10:10

Muhammad Hassan


You could do the following as simple workaround:

df['a'], df['b'], df['c'] = (df['a2'] / df['a1'], df['b2'] / df['b1'], df['c2'] / df['c1'])

Although I think that using the assign method would make your code much more readable:

df.assign(a=lambda x: x['a2'] / x['a1'], 
          b=lambda x: x['b2'] / x['b1'], 
          c=lambda x: x['c2'] / x['c1'])
like image 1
jacob Avatar answered Oct 16 '22 10:10

jacob


MultiIndex comes in handy here, since Pandas is always aligned first on the index(columns is an index as well) before any computations.

Using sample data below:

df = pd.DataFrame({'a_1':range(2,10,2),
            'a_2': range(4, 20, 4),
           'b_1': range(3, 15,3),
           'b_2': range(6,30,6),
           'c_1': range(5, 25, 5),
           'c_2': range(10, 50, 10)})

df
   a_1  a_2  b_1  b_2  c_1  c_2
0    2    4    3    6    5   10
1    4    8    6   12   10   20
2    6   12    9   18   15   30
3    8   16   12   24   20   40

split the columns into a MultiIndex:

temp = df.copy()
temp.columns = temp.columns.str.split('_', expand = True).swaplevel()

temp

 1   2   1   2   1   2
   a   a   b   b   c   c
0  2   4   3   6   5  10
1  4   8   6  12  10  20
2  6  12   9  18  15  30
3  8  16  12  24  20  40

In this form, you can simply select 2 divided by 1:

df['a','b','c']] = temp['2'] / temp['1']

This gives the same values as :

df[['a_2', 'b_2', 'c_2']].values / df[['a_1', 'b_1', 'c_1']].values

Imagine however, that you have lots of columns, you do not need to worry about the pairing, as the MultiIndex form takes care of that, with Pandas doing the alignment before computation.

Numpy is going to be faster - @MuhammadHassan's answer fits nicely; this is just to show how MultiIndex has its place and its uses in the right circumstances.

like image 1
sammywemmy Avatar answered Oct 16 '22 10:10

sammywemmy