DataFrame pairs of columns division

Tags:

I have a DataFrame and want to get divisions of pairs of columns like below:

df = pd.DataFrame({
    'a1': np.random.randint(1, 1000, 1000),
    'a2': np.random.randint(1, 1000, 1000),
    'b1': np.random.randint(1, 1000, 1000),
    'b2': np.random.randint(1, 1000, 1000),
    'c1': np.random.randint(1, 1000, 1000),
    'c2': np.random.randint(1, 1000, 1000),
})
df['a'] = df['a2'] / df['a1']
df['b'] = df['b2'] / df['b1']
df['c'] = df['c2'] / df['c1']

I want to combine the last three lines into one like:

df[['a', 'b', 'c']] = df[['a2', 'b2', 'c2']] / df[['a1', 'b1', 'c1']]

but I only get an error of ValueError: Columns must be same length as key. If I just simply print(df[['a2', 'b2', 'c2']] / df[['a1', 'b1', 'c1']]), I will only get a DataFrame with NaNs of shape (1000, 6).

==== Edit

Now I know why my original one-line code doesn't work. Actually, the arithmetic operations of two DataFrames will be conducted between the columns with same labels, while those columns without same label in another DataFrame will generate NaNs. The result DataFrame will have the union() of the columns of the two operating DataFrames. That's why my original solution will give an ValueError and the div will generate NaNs.

Following example will be helpful to explain:

df1 = pd.DataFrame(data={'A':[1,2], 'B':[3,4], 'C':[5,6], 'D':[8,9]})
df2 = pd.DataFrame(data={'A':[11,12], 'B':[13,14], 'C':[15,16], 'D':[18,19]})
df1[['A', 'B']] / df2[['A', 'B']]
Out[130]: 
          A         B
0  0.090909  0.230769
1  0.166667  0.285714

df1[['A', 'B']] / df2[['C', 'D']]
Out[131]: 
    A   B   C   D
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN

df1[['A', 'B']] + df2[['A', 'C']]
Out[132]: 
    A   B   C
0  12 NaN NaN
1  14 NaN NaN

374

asked Dec 20 '21 10:12

Cuteufo

Video Answer

3 Answers

You can use:

df[['a', 'b', 'c']] = df[['a2', 'b2', 'c2']].values / df[['a1', 'b1', 'c1']].values

OUTPUT

      a1   a2   b1   b2   c1   c2          a         b         c
0    864  214  551  761  174  111   0.247685  1.381125  0.637931
1    820  971  379   79  190  587   1.184146  0.208443  3.089474
2    305  154  519  378  567  186   0.504918  0.728324  0.328042
3     51  505  303  417  959  326   9.901961  1.376238  0.339937
4     84  531  625  899  248  905   6.321429  1.438400  3.649194
..   ...  ...  ...  ...  ...  ...        ...       ...       ...
995  302  695  790  777  896  975   2.301325  0.983544  1.088170
996   24  308  462  316  388  784  12.833333  0.683983  2.020619
997  135  286  359  752  282  283   2.118519  2.094708  1.003546
998  695   45  832  936  811  404   0.064748  1.125000  0.498150
999  809  454  971  335  366  884   0.561187  0.345005  2.415301

142

answered Oct 16 '22 10:10

Muhammad Hassan

You could do the following as simple workaround:

df['a'], df['b'], df['c'] = (df['a2'] / df['a1'], df['b2'] / df['b1'], df['c2'] / df['c1'])

Although I think that using the assign method would make your code much more readable:

df.assign(a=lambda x: x['a2'] / x['a1'], 
          b=lambda x: x['b2'] / x['b1'], 
          c=lambda x: x['c2'] / x['c1'])

answered Oct 16 '22 10:10

jacob

MultiIndex comes in handy here, since Pandas is always aligned first on the index(columns is an index as well) before any computations.

Using sample data below:

df = pd.DataFrame({'a_1':range(2,10,2),
            'a_2': range(4, 20, 4),
           'b_1': range(3, 15,3),
           'b_2': range(6,30,6),
           'c_1': range(5, 25, 5),
           'c_2': range(10, 50, 10)})

df
   a_1  a_2  b_1  b_2  c_1  c_2
0    2    4    3    6    5   10
1    4    8    6   12   10   20
2    6   12    9   18   15   30
3    8   16   12   24   20   40

split the columns into a MultiIndex:

temp = df.copy()
temp.columns = temp.columns.str.split('_', expand = True).swaplevel()

temp

 1   2   1   2   1   2
   a   a   b   b   c   c
0  2   4   3   6   5  10
1  4   8   6  12  10  20
2  6  12   9  18  15  30
3  8  16  12  24  20  40

In this form, you can simply select 2 divided by 1:

df['a','b','c']] = temp['2'] / temp['1']

This gives the same values as :

df[['a_2', 'b_2', 'c_2']].values / df[['a_1', 'b_1', 'c_1']].values

Imagine however, that you have lots of columns, you do not need to worry about the pairing, as the MultiIndex form takes care of that, with Pandas doing the alignment before computation.

Numpy is going to be faster - @MuhammadHassan's answer fits nicely; this is just to show how MultiIndex has its place and its uses in the right circumstances.

answered Oct 16 '22 10:10

sammywemmy

Related questions
                            
                                Multiple Hidden Imports in Pyinstaller
                            
                                Pandas group by and sum, but create a new row when a certain amount is exceeded
                            
                                Building wheel for cffi (setup.py) ... error while installing the packages from requirements.txt in django
                            
                                Looping tasks in Prefect
                            
                                pip install py-find-1st fails on ubuntu20 & centos with python3.9
                            
                                Create hierarchy column in pandas
                            
                                `multiprocessing.Process` are modifying non-shared variables they should not have access to
                            
                                Troubles with downloading and saving a document in django
                            
                                Find indices for elements in array B best matching those in array A
                            
                                How to rename the first column of a pandas dataframe?
                            
                                Pandas: reading multi-index JSON as pandas data frame
                            
                                What is the best way to get accurate text similarity in python for comparing single words or bigrams?
                            
                                Why does mypy not accept a list[str] as a list[Optional[str]]?
                            
                                Pandas Weighted Stats
                            
                                Read spark data with column that clashes with partition name
                            
                                How to cache in IPython Notebook?
                            
                                Allowing only single active session per user in Django app
                            
                                import-im6.q16: not authorized error 'os' @ error/constitue.c/WriteImage/1037 for a Python web scraper
                            
                                Is there any python function/library for calculate binomial confidence intervals?
                            
                                Retry on deadlock for MySQL / SQLAlchemy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

DataFrame pairs of columns division

Tags:

python

pandas

dataframe

Cuteufo

People also ask

Video Answer

3 Answers

Muhammad Hassan

jacob

sammywemmy

Recent Activity

Donate For Us