I am trying to calculate multiple colums from multiple columns in a pandas dataframe using a function. The function takes three arguments -a-, -b-, and -c- and and returns three calculated values -sum-, -prod- and -quot-. In my pandas data frame I have three coumns -a-, -b- and and -c- from which I want to calculate the columns -sum-, -prod- and -quot-.
The mapping that I do works only when I have exactly three rows. I do not know what is going wrong, although I expect that it has to do something with selecting the correct axis. Could someone explain what is happening and how I can calculate the values that I would like to have. Below are the situations that I have tested.
INITIAL VALUES
def sum_prod_quot(a,b,c):
sum = a + b + c
prod = a * b * c
quot = a / b / c
return (sum, prod, quot)
df = pd.DataFrame({ 'a': [20, 100, 18],
'b': [ 5, 10, 3],
'c': [ 2, 10, 6],
'd': [ 1, 2, 3]
})
df
a b c d
0 20 5 2 1
1 100 10 10 2
2 18 3 6 3
CALCULATION STEPS
Using exactly three rows
When I calculate three columns from this dataframe and using the function function I get:
df['sum'], df['prod'], df['quot'] = \
list( map(sum_prod_quot, df['a'], df['b'], df['c']))
df
a b c d sum prod quot
0 20 5 2 1 27.0 120.0 27.0
1 100 10 10 2 200.0 10000.0 324.0
2 18 3 6 3 2.0 1.0 1.0
This is exactly the result that I want to have: The sum-column has the sum of the elements in the columns a,b,c; the prod-column has the product of the elements in the columns a,b,c and the quot-column has the quotients of the elements in the columns a,b,c.
Using more than three rows
When I expand the dataframe with one row, I get an error!
The data frame is defined as:
df = pd.DataFrame({ 'a': [20, 100, 18, 40],
'b': [ 5, 10, 3, 10],
'c': [ 2, 10, 6, 4],
'd': [ 1, 2, 3, 4]
})
df
a b c d
0 20 5 2 1
1 100 10 10 2
2 18 3 6 3
3 40 10 4 4
The call is
df['sum'], df['prod'], df['quot'] = \
list( map(sum_prod_quot, df['a'], df['b'], df['c']))
The result is
...
list( map(sum_prod_quot, df['a'], df['b'], df['c']))
ValueError: too many values to unpack (expected 3)
while I would expect an extra row:
df
a b c d sum prod quot
0 20 5 2 1 27.0 120.0 27.0
1 100 10 10 2 200.0 10000.0 324.0
2 18 3 6 3 2.0 1.0 1.0
3 40 10 4 4 54.0 1600.0 1.0
Using less than three rows
When I reduce tthe dataframe with one row I get also an error. The dataframe is defined as:
df = pd.DataFrame({ 'a': [20, 100],
'b': [ 5, 10],
'c': [ 2, 10],
'd': [ 1, 2]
})
df
a b c d
0 20 5 2 1
1 100 10 10 2
The call is
df['sum'], df['prod'], df['quot'] = \
list( map(sum_prod_quot, df['a'], df['b'], df['c']))
The result is
...
list( map(sum_prod_quot, df['a'], df['b'], df['c']))
ValueError: need more than 2 values to unpack
while I would expect a row less:
df
a b c d sum prod quot
0 20 5 2 1 27.0 120.0 27.0
1 100 10 10 2 200.0 10000.0 324.0
QUESTIONS
The questions I have:
1) Why do I get these errors?
2) How do I have to modify the call such that I get the desired data frame?
NOTE
In this link a similar question is asked, but the given answer did not work for me.
The answer doesn't seem correct for 3 rows as well. Can you check other values except first row and first column. Looking at the results, product of 20*5*2 is NOT 120, it's 200 and is placed below in sum column. You need to form list in correct way before assigning to new columns. You can try use following to set the new columns:
df['sum'], df['prod'], df['quot'] = zip(*map(sum_prod_quot, df['a'], df['b'], df['c']))
For details follow the link
The apply() method can be used for this problem, when providing the result_type argument:
df[["sum", "prod", "quot"]] = df.apply(
lambda row: sum_prod_quot(row["a"], row["b"], row["c"]),
axis=1,
result_type="expand",
)
alternatively (faster for larger dataframes) use apply() only on a selection of the dataframe and unpacking the values of row
df[["sum", "prod", "quot"]] = df[["a", "b", "c"]].apply(
lambda row: sum_prod_quot(*row),
axis=1,
result_type="expand",
)
In both cases, the return value of sum_prod_quot() is added as new columns to df.
Explanation:
apply() with axis=1 applies the function (first argument) to each row separately.lambda takes care that passing of row matches the function's signature (sum_prod_quot() requires 3 arguments).result_type="expand" turns the list-like return value of sum_prod_quot() into columns."sum", "prod", and "quot" columns of df (df[["sum", "prod", "quot"]]).For further info see pandas docs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With