Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating a dataframe in pandas by multiplying two series together

Say I have two series in pandas, series A and series B. How do I create a dataframe in which all of those values are multiplied together, i.e. with series A down the left hand side and series B along the top. Basically the same concept as this, where series A would be the yellow on the left and series B the yellow along the top, and all the values in between would be filled in by multiplication:

http://www.google.co.uk/imgres?imgurl=http://www.vaughns-1-pagers.com/computer/multiplication-tables/times-table-12x12.gif&imgrefurl=http://www.vaughns-1-pagers.com/computer/multiplication-tables.htm&h=533&w=720&sz=58&tbnid=9B8R_kpUloA4NM:&tbnh=90&tbnw=122&zoom=1&usg=__meqZT9kIAMJ5b8BenRzF0l-CUqY=&docid=j9BT8tUCNtg--M&sa=X&ei=bkBpUpOWOI2p0AWYnIHwBQ&ved=0CE0Q9QEwBg

Sorry, should probably have added that my two series are not the same length. I'm getting an error now that 'matrices are not aligned' so I assume that's the problem.

like image 750
Aoife Avatar asked Oct 24 '13 15:10

Aoife


People also ask

How do you multiply two Series in pandas?

Multiplying of two pandas. Series objects can be done through applying the multiplication operator “*” as well. Through mul() method, handling None values in the data is possible by replacing them with a default value using the parameter fill_value.

How do you combine two Series in a data frame?

To combine two series into a DataFrame in Pandas, we can take two series and concatenate them using concat() method.

Can we create pandas DataFrame using Series?

Series is a type of list in pandas which can take integer values, string values, double values and more. But in Pandas Series we return an object in the form of list, having index starting from 0 to n, Where n is the length of values in series.

Can you multiply two Dataframes in pandas?

Overview: The mul() method of DataFrame object multiplies the elements of a DataFrame object with another DataFrame object, series or any other Python sequence. mul() does an elementwise multiplication of a DataFrame with another DataFrame, a pandas Series or a Python Sequence.


4 Answers

You can use matrix multiplication dot, but before you have to convert Series to DataFrame (because dot method on Series implements dot product):

>>> B = pd.Series(range(1, 5))
>>> A = pd.Series(range(1, 5))
>>> dfA = pd.DataFrame(A)
>>> dfB = pd.DataFrame(B)
>>> dfA.dot(dfB.T)
   0  1   2   3
0  1  2   3   4
1  2  4   6   8
2  3  6   9  12
3  4  8  12  16
like image 76
Roman Pekar Avatar answered Nov 08 '22 18:11

Roman Pekar


You can create a DataFrame from multiplying two series of unequal length by broadcasting each value of the row (or column) with the other series. For example:

> row = pd.Series(np.arange(1, 6), index=np.arange(1, 6))
> col = pd.Series(np.arange(1, 4), index=np.arange(1, 4))
> row.apply(lambda r: r * col)
   1   2   3
1  1   2   3
2  2   4   6
3  3   6   9
4  4   8  12
5  5  10  15
like image 24
dworvos Avatar answered Nov 08 '22 17:11

dworvos


First create a DataFrame of 1's. Then broadcast multiply along each axis in turn.

>>> s1 = Series([1,2,3,4,5])
>>> s2 = Series([10,20,30])
>>> df = DataFrame(1, index=s1.index, columns=s2.index)
>>> df
   0  1  2
0  1  1  1
1  1  1  1
2  1  1  1
3  1  1  1
4  1  1  1
>>>> df.multiply(s1, axis='index') * s2
    0    1    2
0  10   20   30
1  20   40   60
2  30   60   90
3  40   80  120
4  50  100  150

You need to use df.multiply in order to specify that the series will line up with the row index. You can use the normal multiplication operator * with s2 because matching on columns is the default way of doing multiplication between a DataFrame and a Series.

like image 3
jkitchen Avatar answered Nov 08 '22 17:11

jkitchen


So I think this may get you most of the way there if you have two series of different lengths. This seems like a very manual process but I cannot think of another way using pandas or NumPy functions.

>>>> a = Series([1, 3, 3, 5, 5])
>>>> b = Series([5, 10])

First convert your row values a to a DataFrame and make copies of this Series in the form of new columns as many as you have values in your columns series b.

>>>> result = DataFrame(a)
>>>> for i in xrange(len(b)):
            result[i] = a
   0   1
0  1   1
1  3   3
2  3   3
3  5   5
4  5   5

You can then broadcast your Series b over your DataFrame result:

>>>> result = result.mul(b)
   0   1
0  5   10
1  15  30
2  15  30
3  25  50
4  25  50

In the example I have chosen, you will end up with indexes that are duplicates due to your initial Series. I would recommend leaving the indexes as unique identifiers. This makes programmatic sense otherwise you will return more than one value when you select an index that has more than one row assigned to it. If you must, you can then reindex your row labels and your column labels using these functions:

>>>> result.columns = b
>>>> result.set_index(a)
   5   10
1  5   10
3  15  30
3  15  30
5  25  50
5  25  50

Example of duplicate indexing:

>>>> result.loc[3]
   5   10
3  15  30
3  15  30
like image 1
clintval Avatar answered Nov 08 '22 16:11

clintval