What is the best way to multiply all the columns of a Pandas DataFrame
by a column vector stored in a Series
? I used to do this in Matlab with repmat()
, which doesn't exist in Pandas. I can use np.tile()
, but it looks ugly to convert the data structure back and forth each time.
Thanks.
multiply() function perform the multiplication of series and other, element-wise. The operation is equivalent to series * other , but with support to substitute a fill_value for missing data in one of the inputs.
merge() can be used for all database join operations between DataFrame or named series objects. You have to pass an extra parameter “name” to the series in this case. For instance, pd. merge(S1, S2, right_index=True, left_index=True) .
You can create a DataFrame from multiple Series objects by adding each series as a columns. By using concat() method you can merge multiple series together into DataFrame.
mul() function return multiplication of dataframe and other element- wise. This function essentially does the same thing as the dataframe * other, but it provides an additional support to handle missing values in one of the inputs. Example #1: Use mul() function to find the multiplication of a dataframe with a series.
What's wrong with
result = dataframe.mul(series, axis=0)
?
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mul.html#pandas.DataFrame.mul
This can be accomplished quite simply with the DataFrame method apply
.
In[1]: import pandas as pd; import numpy as np In[2]: df = pd.DataFrame(np.arange(40.).reshape((8, 5)), columns=list('abcde')); df Out[2]: a b c d e 0 0 1 2 3 4 1 5 6 7 8 9 2 10 11 12 13 14 3 15 16 17 18 19 4 20 21 22 23 24 5 25 26 27 28 29 6 30 31 32 33 34 7 35 36 37 38 39 In[3]: ser = pd.Series(np.arange(8) * 10); ser Out[3]: 0 0 1 10 2 20 3 30 4 40 5 50 6 60 7 70
Now that we have our DataFrame
and Series
we need a function to pass to apply
.
In[4]: func = lambda x: np.asarray(x) * np.asarray(ser)
We can pass this to df.apply
and we are good to go
In[5]: df.apply(func) Out[5]: a b c d e 0 0 0 0 0 0 1 50 60 70 80 90 2 200 220 240 260 280 3 450 480 510 540 570 4 800 840 880 920 960 5 1250 1300 1350 1400 1450 6 1800 1860 1920 1980 2040 7 2450 2520 2590 2660 2730
df.apply
acts column-wise by default, but it can can also act row-wise by passing axis=1
as an argument to apply
.
In[6]: ser2 = pd.Series(np.arange(5) *5); ser2 Out[6]: 0 0 1 5 2 10 3 15 4 20 In[7]: func2 = lambda x: np.asarray(x) * np.asarray(ser2) In[8]: df.apply(func2, axis=1) Out[8]: a b c d e 0 0 5 20 45 80 1 0 30 70 120 180 2 0 55 120 195 280 3 0 80 170 270 380 4 0 105 220 345 480 5 0 130 270 420 580 6 0 155 320 495 680 7 0 180 370 570 780
This could be done more concisely by defining the anonymous function inside apply
In[9]: df.apply(lambda x: np.asarray(x) * np.asarray(ser)) Out[9]: a b c d e 0 0 0 0 0 0 1 50 60 70 80 90 2 200 220 240 260 280 3 450 480 510 540 570 4 800 840 880 920 960 5 1250 1300 1350 1400 1450 6 1800 1860 1920 1980 2040 7 2450 2520 2590 2660 2730 In[10]: df.apply(lambda x: np.asarray(x) * np.asarray(ser2), axis=1) Out[10]: a b c d e 0 0 5 20 45 80 1 0 30 70 120 180 2 0 55 120 195 280 3 0 80 170 270 380 4 0 105 220 345 480 5 0 130 270 420 580 6 0 155 320 495 680 7 0 180 370 570 780
Why not create your own dataframe tile function:
def tile_df(df, n, m):
dfn = df.T
for _ in range(1, m):
dfn = dfn.append(df.T, ignore_index=True)
dfm = dfn.T
for _ in range(1, n):
dfm = dfm.append(dfn.T, ignore_index=True)
return dfm
df = pandas.DataFrame([[1,2],[3,4]])
tile_df(df, 2, 3)
# 0 1 2 3 4 5
# 0 1 2 1 2 1 2
# 1 3 4 3 4 3 4
# 2 1 2 1 2 1 2
# 3 3 4 3 4 3 4
However, the docs note: "DataFrame is not intended to be a drop-in replacement for ndarray as its indexing semantics are quite different in places from a matrix." Which presumably should be interpreted as "use numpy if you are doing lots of matrix stuff".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With