What is the best way to multiply all the columns of a Pandas <code>DataFrame</code> by a column vector stored in a <code>Series</code>? I used to do this in Matlab with <code>repmat()</code>, which doesn't exist in Pandas. I can use <code>np.tile()</code>, but it looks ugly to convert the data structure back and forth each time. Thanks.

What's wrong with <pre class="prettyprint"><code>result = dataframe.mul(series, axis=0) </code></pre> ? https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mul.html#pandas.DataFrame.mul

pandas dataframe multiply with a series [duplicate]

Tags:

pandas

dataframe

multiplication

What is the best way to multiply all the columns of a Pandas DataFrame by a column vector stored in a Series? I used to do this in Matlab with repmat(), which doesn't exist in Pandas. I can use np.tile(), but it looks ugly to convert the data structure back and forth each time.

Thanks.

487

asked Oct 31 '12 20:10

jianpan

3 Answers

What's wrong with

result = dataframe.mul(series, axis=0)

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.mul.html#pandas.DataFrame.mul

answered Sep 20 '22 01:09

Wes McKinney

This can be accomplished quite simply with the DataFrame method apply.

In[1]: import pandas as pd; import numpy as np  In[2]: df = pd.DataFrame(np.arange(40.).reshape((8, 5)), columns=list('abcde')); df Out[2]:          a   b   c   d   e     0   0   1   2   3   4     1   5   6   7   8   9     2  10  11  12  13  14     3  15  16  17  18  19     4  20  21  22  23  24     5  25  26  27  28  29     6  30  31  32  33  34     7  35  36  37  38  39  In[3]: ser = pd.Series(np.arange(8) * 10); ser Out[3]:      0     0     1    10     2    20     3    30     4    40     5    50     6    60     7    70

Now that we have our DataFrame and Series we need a function to pass to apply.

In[4]: func = lambda x: np.asarray(x) * np.asarray(ser)

We can pass this to df.apply and we are good to go

In[5]: df.apply(func) Out[5]:           a     b     c     d     e     0     0     0     0     0     0     1    50    60    70    80    90     2   200   220   240   260   280     3   450   480   510   540   570     4   800   840   880   920   960     5  1250  1300  1350  1400  1450     6  1800  1860  1920  1980  2040     7  2450  2520  2590  2660  2730

df.apply acts column-wise by default, but it can can also act row-wise by passing axis=1 as an argument to apply.

In[6]: ser2 = pd.Series(np.arange(5) *5); ser2 Out[6]:      0     0     1     5     2    10     3    15     4    20  In[7]: func2 = lambda x: np.asarray(x) * np.asarray(ser2)  In[8]: df.apply(func2, axis=1) Out[8]:         a    b    c    d    e     0  0    5   20   45   80     1  0   30   70  120  180     2  0   55  120  195  280     3  0   80  170  270  380     4  0  105  220  345  480     5  0  130  270  420  580     6  0  155  320  495  680     7  0  180  370  570  780

This could be done more concisely by defining the anonymous function inside apply

In[9]: df.apply(lambda x: np.asarray(x) * np.asarray(ser)) Out[9]:            a     b     c     d     e     0     0     0     0     0     0     1    50    60    70    80    90     2   200   220   240   260   280     3   450   480   510   540   570     4   800   840   880   920   960     5  1250  1300  1350  1400  1450     6  1800  1860  1920  1980  2040     7  2450  2520  2590  2660  2730  In[10]: df.apply(lambda x: np.asarray(x) * np.asarray(ser2), axis=1) Out[10]:        a    b    c    d    e     0  0    5   20   45   80     1  0   30   70  120  180     2  0   55  120  195  280     3  0   80  170  270  380     4  0  105  220  345  480     5  0  130  270  420  580     6  0  155  320  495  680     7  0  180  370  570  780

answered Sep 19 '22 01:09

spencerlyon2

Why not create your own dataframe tile function:

def tile_df(df, n, m):
    dfn = df.T
    for _ in range(1, m):
        dfn = dfn.append(df.T, ignore_index=True)
    dfm = dfn.T
    for _ in range(1, n):
        dfm = dfm.append(dfn.T, ignore_index=True)
    return dfm

Example:

df = pandas.DataFrame([[1,2],[3,4]])
tile_df(df, 2, 3)
#    0  1  2  3  4  5
# 0  1  2  1  2  1  2
# 1  3  4  3  4  3  4
# 2  1  2  1  2  1  2
# 3  3  4  3  4  3  4

However, the docs note: "DataFrame is not intended to be a drop-in replacement for ndarray as its indexing semantics are quite different in places from a matrix." Which presumably should be interpreted as "use numpy if you are doing lots of matrix stuff".

answered Sep 18 '22 01:09

Andy Hayden

Related questions
                            
                                How to use str.contains() with multiple expressions, in pandas dataframes?
                            
                                Pivot String column on Pyspark Dataframe
                            
                                Unpivot Pandas Data
                            
                                How to select a range of values in a pandas dataframe column?
                            
                                pandas combine two columns with null values
                            
                                Pandas Groupby: Count and mean combined
                            
                                Pandas query function not working with spaces in column names
                            
                                How can I remove all non-numeric characters from all the values in a particular column in pandas dataframe?
                            
                                Python Pandas: Boolean indexing on multiple columns [duplicate]
                            
                                Extend contigency table with proportions (percentages)
                            
                                PySpark - Sum a column in dataframe and return results as int
                            
                                R convert dataframe to JSON
                            
                                Reversing 'one-hot' encoding in Pandas
                            
                                Create a sequential number (counter) for rows within each group of a dataframe [duplicate]
                            
                                Convert all columns to characters in a data.frame
                            
                                Populating a data frame in R in a loop
                            
                                Cleanest, most efficient syntax to perform DataFrame self-join in Spark
                            
                                Data type conversion error: ValueError: Cannot convert non-finite values (NA or inf) to integer [duplicate]
                            
                                Change timezone of date-time column in pandas and add as hierarchical index
                            
                                Dataframe set_index not setting

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With