Say I have two series in pandas, series A and series B. How do I create a dataframe in which all of those values are multiplied together, i.e. with series A down the left hand side and series B along the top. Basically the same concept as this, where series A would be the yellow on the left and series B the yellow along the top, and all the values in between would be filled in by multiplication:
http://www.google.co.uk/imgres?imgurl=http://www.vaughns-1-pagers.com/computer/multiplication-tables/times-table-12x12.gif&imgrefurl=http://www.vaughns-1-pagers.com/computer/multiplication-tables.htm&h=533&w=720&sz=58&tbnid=9B8R_kpUloA4NM:&tbnh=90&tbnw=122&zoom=1&usg=__meqZT9kIAMJ5b8BenRzF0l-CUqY=&docid=j9BT8tUCNtg--M&sa=X&ei=bkBpUpOWOI2p0AWYnIHwBQ&ved=0CE0Q9QEwBg
Sorry, should probably have added that my two series are not the same length. I'm getting an error now that 'matrices are not aligned' so I assume that's the problem.
Multiplying of two pandas. Series objects can be done through applying the multiplication operator “*” as well. Through mul() method, handling None values in the data is possible by replacing them with a default value using the parameter fill_value.
To combine two series into a DataFrame in Pandas, we can take two series and concatenate them using concat() method.
Series is a type of list in pandas which can take integer values, string values, double values and more. But in Pandas Series we return an object in the form of list, having index starting from 0 to n, Where n is the length of values in series.
Overview: The mul() method of DataFrame object multiplies the elements of a DataFrame object with another DataFrame object, series or any other Python sequence. mul() does an elementwise multiplication of a DataFrame with another DataFrame, a pandas Series or a Python Sequence.
You can use matrix multiplication dot, but before you have to convert Series to DataFrame (because dot method on Series implements dot product):
>>> B = pd.Series(range(1, 5))
>>> A = pd.Series(range(1, 5))
>>> dfA = pd.DataFrame(A)
>>> dfB = pd.DataFrame(B)
>>> dfA.dot(dfB.T)
0 1 2 3
0 1 2 3 4
1 2 4 6 8
2 3 6 9 12
3 4 8 12 16
You can create a DataFrame from multiplying two series of unequal length by broadcasting each value of the row (or column) with the other series. For example:
> row = pd.Series(np.arange(1, 6), index=np.arange(1, 6))
> col = pd.Series(np.arange(1, 4), index=np.arange(1, 4))
> row.apply(lambda r: r * col)
1 2 3
1 1 2 3
2 2 4 6
3 3 6 9
4 4 8 12
5 5 10 15
First create a DataFrame of 1's. Then broadcast multiply along each axis in turn.
>>> s1 = Series([1,2,3,4,5])
>>> s2 = Series([10,20,30])
>>> df = DataFrame(1, index=s1.index, columns=s2.index)
>>> df
0 1 2
0 1 1 1
1 1 1 1
2 1 1 1
3 1 1 1
4 1 1 1
>>>> df.multiply(s1, axis='index') * s2
0 1 2
0 10 20 30
1 20 40 60
2 30 60 90
3 40 80 120
4 50 100 150
You need to use df.multiply
in order to specify that the series will line up with the row index. You can use the normal multiplication operator *
with s2 because matching on columns is the default way of doing multiplication between a DataFrame and a Series.
So I think this may get you most of the way there if you have two series of different lengths. This seems like a very manual process but I cannot think of another way using pandas or NumPy functions.
>>>> a = Series([1, 3, 3, 5, 5])
>>>> b = Series([5, 10])
First convert your row values a
to a DataFrame and make copies of this Series in the form of new columns as many as you have values in your columns series b
.
>>>> result = DataFrame(a)
>>>> for i in xrange(len(b)):
result[i] = a
0 1
0 1 1
1 3 3
2 3 3
3 5 5
4 5 5
You can then broadcast your Series b
over your DataFrame result
:
>>>> result = result.mul(b)
0 1
0 5 10
1 15 30
2 15 30
3 25 50
4 25 50
In the example I have chosen, you will end up with indexes that are duplicates due to your initial Series. I would recommend leaving the indexes as unique identifiers. This makes programmatic sense otherwise you will return more than one value when you select an index that has more than one row assigned to it. If you must, you can then reindex your row labels and your column labels using these functions:
>>>> result.columns = b
>>>> result.set_index(a)
5 10
1 5 10
3 15 30
3 15 30
5 25 50
5 25 50
Example of duplicate indexing:
>>>> result.loc[3]
5 10
3 15 30
3 15 30
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With