Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiply each column in a data frame with the columns to its right in Python

I have a large data frame with about 400.000 observations and 6.500 columns. I am looking for a fast way of multiplying each column with the columns to its right in turn.

An example data frame could look like this:

| V1  | V2  | V3  |  
----------------------
|  1  |  2  |  1  |
|  0  |  4  |  1  |
|  1  |  3  |  3  |

I would like to have something like this in the end:

| V1 | V2 | V3 | V1_V2 | V1_V3 | V2_V3 |
-----------------------------------------
|  1 |  2 |  1 |    2  |   1   |   2   |
|  0 |  4 |  1 |    0  |   0   |   4   |
|  1 |  3 |  3 |    3  |   0   |   9   |

I tried itertools.combinations but it is too slow. I am a beginner in Python, so maybe there is a simple solution I am not aware of.

Thank you for your help!

like image 448
Lisa Avatar asked Jan 15 '19 10:01

Lisa


People also ask

How do you multiply columns in a DataFrame in Python?

Use the syntax df[col1] * df[col2] to multiply columns with names col1 and col2 in df .

How do you multiply values in a column in a data frame?

Use the * operator to multiply a column by a constant number Select a column of DataFrame df using syntax df["column_name"] and set it equal to n * df["column_name"] where n is the number to multiply by.

How do you multiply data frames in Python?

Pandas DataFrame mul() Method The mul() method multiplies each value in the DataFrame with a specified value. The specified value must be an object that can be multiplied with the values of the DataFrame.


1 Answers

The pandas vectorized operations (like multiplication) are efficient in themselves. You can use something like the following to take advantage of this:

# Extract column names
cols =  df.columns.tolist() 

# Generate all adjacent pairs, including the circular one
cols_to_create = [(cols[i], cols[i+1]) for i in range(len(cols)-1)] \
                 + [(cols[len(cols)-1], cols[0])] 

# Perform multiplication on all pairs
for x, y in cols_to_create:
    df[x+'_'+y] = df[x]*df[y]

like image 78
bh00t Avatar answered Sep 23 '22 10:09

bh00t