Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas dataframe multiply with a series [duplicate]

What is the best way to multiply all the columns of a Pandas DataFrame by a column vector stored in a Series? I used to do this in Matlab with repmat(), which doesn't exist in Pandas. I can use np.tile(), but it looks ugly to convert the data structure back and forth each time.

Thanks.

like image 487
jianpan Avatar asked Oct 31 '12 20:10

jianpan


People also ask

How do you multiply Series in pandas?

multiply() function perform the multiplication of series and other, element-wise. The operation is equivalent to series * other , but with support to substitute a fill_value for missing data in one of the inputs.

Can I merge DataFrame with Series?

merge() can be used for all database join operations between DataFrame or named series objects. You have to pass an extra parameter “name” to the series in this case. For instance, pd. merge(S1, S2, right_index=True, left_index=True) .

Can a DataFrame contains multiple Series?

You can create a DataFrame from multiple Series objects by adding each series as a columns. By using concat() method you can merge multiple series together into DataFrame.

How do you multiply in a data frame?

mul() function return multiplication of dataframe and other element- wise. This function essentially does the same thing as the dataframe * other, but it provides an additional support to handle missing values in one of the inputs. Example #1: Use mul() function to find the multiplication of a dataframe with a series.


3 Answers

What's wrong with

result = dataframe.mul(series, axis=0) 

?

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mul.html#pandas.DataFrame.mul

like image 84
Wes McKinney Avatar answered Sep 20 '22 01:09

Wes McKinney


This can be accomplished quite simply with the DataFrame method apply.

In[1]: import pandas as pd; import numpy as np  In[2]: df = pd.DataFrame(np.arange(40.).reshape((8, 5)), columns=list('abcde')); df Out[2]:          a   b   c   d   e     0   0   1   2   3   4     1   5   6   7   8   9     2  10  11  12  13  14     3  15  16  17  18  19     4  20  21  22  23  24     5  25  26  27  28  29     6  30  31  32  33  34     7  35  36  37  38  39  In[3]: ser = pd.Series(np.arange(8) * 10); ser Out[3]:      0     0     1    10     2    20     3    30     4    40     5    50     6    60     7    70 

Now that we have our DataFrame and Series we need a function to pass to apply.

In[4]: func = lambda x: np.asarray(x) * np.asarray(ser) 

We can pass this to df.apply and we are good to go

In[5]: df.apply(func) Out[5]:           a     b     c     d     e     0     0     0     0     0     0     1    50    60    70    80    90     2   200   220   240   260   280     3   450   480   510   540   570     4   800   840   880   920   960     5  1250  1300  1350  1400  1450     6  1800  1860  1920  1980  2040     7  2450  2520  2590  2660  2730 

df.apply acts column-wise by default, but it can can also act row-wise by passing axis=1 as an argument to apply.

In[6]: ser2 = pd.Series(np.arange(5) *5); ser2 Out[6]:      0     0     1     5     2    10     3    15     4    20  In[7]: func2 = lambda x: np.asarray(x) * np.asarray(ser2)  In[8]: df.apply(func2, axis=1) Out[8]:         a    b    c    d    e     0  0    5   20   45   80     1  0   30   70  120  180     2  0   55  120  195  280     3  0   80  170  270  380     4  0  105  220  345  480     5  0  130  270  420  580     6  0  155  320  495  680     7  0  180  370  570  780 

This could be done more concisely by defining the anonymous function inside apply

In[9]: df.apply(lambda x: np.asarray(x) * np.asarray(ser)) Out[9]:            a     b     c     d     e     0     0     0     0     0     0     1    50    60    70    80    90     2   200   220   240   260   280     3   450   480   510   540   570     4   800   840   880   920   960     5  1250  1300  1350  1400  1450     6  1800  1860  1920  1980  2040     7  2450  2520  2590  2660  2730  In[10]: df.apply(lambda x: np.asarray(x) * np.asarray(ser2), axis=1) Out[10]:        a    b    c    d    e     0  0    5   20   45   80     1  0   30   70  120  180     2  0   55  120  195  280     3  0   80  170  270  380     4  0  105  220  345  480     5  0  130  270  420  580     6  0  155  320  495  680     7  0  180  370  570  780 
like image 23
spencerlyon2 Avatar answered Sep 19 '22 01:09

spencerlyon2


Why not create your own dataframe tile function:

def tile_df(df, n, m):
    dfn = df.T
    for _ in range(1, m):
        dfn = dfn.append(df.T, ignore_index=True)
    dfm = dfn.T
    for _ in range(1, n):
        dfm = dfm.append(dfn.T, ignore_index=True)
    return dfm

Example:

df = pandas.DataFrame([[1,2],[3,4]])
tile_df(df, 2, 3)
#    0  1  2  3  4  5
# 0  1  2  1  2  1  2
# 1  3  4  3  4  3  4
# 2  1  2  1  2  1  2
# 3  3  4  3  4  3  4

However, the docs note: "DataFrame is not intended to be a drop-in replacement for ndarray as its indexing semantics are quite different in places from a matrix." Which presumably should be interpreted as "use numpy if you are doing lots of matrix stuff".

like image 36
Andy Hayden Avatar answered Sep 18 '22 01:09

Andy Hayden