I have a large data frame with about 400.000 observations and 6.500 columns. I am looking for a fast way of multiplying each column with the columns to its right in turn.
An example data frame could look like this:
| V1 | V2 | V3 |
----------------------
| 1 | 2 | 1 |
| 0 | 4 | 1 |
| 1 | 3 | 3 |
I would like to have something like this in the end:
| V1 | V2 | V3 | V1_V2 | V1_V3 | V2_V3 |
-----------------------------------------
| 1 | 2 | 1 | 2 | 1 | 2 |
| 0 | 4 | 1 | 0 | 0 | 4 |
| 1 | 3 | 3 | 3 | 0 | 9 |
I tried itertools.combinations
but it is too slow. I am a beginner in Python, so maybe there is a simple solution I am not aware of.
Thank you for your help!
Use the syntax df[col1] * df[col2] to multiply columns with names col1 and col2 in df .
Use the * operator to multiply a column by a constant number Select a column of DataFrame df using syntax df["column_name"] and set it equal to n * df["column_name"] where n is the number to multiply by.
Pandas DataFrame mul() Method The mul() method multiplies each value in the DataFrame with a specified value. The specified value must be an object that can be multiplied with the values of the DataFrame.
The pandas vectorized operations (like multiplication) are efficient in themselves. You can use something like the following to take advantage of this:
# Extract column names
cols = df.columns.tolist()
# Generate all adjacent pairs, including the circular one
cols_to_create = [(cols[i], cols[i+1]) for i in range(len(cols)-1)] \
+ [(cols[len(cols)-1], cols[0])]
# Perform multiplication on all pairs
for x, y in cols_to_create:
df[x+'_'+y] = df[x]*df[y]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With