I'm giving a toy example but it will help me understand what's going on for something else I'm trying to do. Let's say I want a new column in a dataframe 'optimal_fruit' that is apples * orange - bananas.
I can do something like this to get it.
df2['optimal_fruit'] = df2['apples'] * df2['oranges'] - df2['bananas']
apples oranges bananas optimal_fruit
1 6 11 -5
2 7 12 2
3 8 13 11
4 9 14 22
5 10 15 35
What is happening if I try to do something like this? And how could I do this in a list comprehension?
df2['optimal_fruit'] = [x * y - z for x in df2['apples'] for y in df2['oranges'] for z in df2['bananas']]
I get an error of:
ValueError: Length of values does not match length of index
As always, thank you all so much for your help!
A dataframe is two-dimensional data structure with rows and columns. A list comprehension is a shorthand syntax for creating new lists based on existing lists. They are one of several methods that are available in Python to accomplish this.
List comprehension is an elegant way to define and create lists based on existing lists. List comprehension is generally more compact and faster than normal functions and loops for creating list. However, we should avoid writing very long list comprehensions in one line to ensure that code is user-friendly.
List comprehension offers a shorter syntax when you want to create a new list based on the values of an existing list. Example: Based on a list of fruits, you want a new list, containing only the fruits with the letter "a" in the name.
Essentially your list comprehension statement is a set of 3 nested loops. In code:
l = []
for x in df2['apples']:
for y in df2['oranges']:
for z in df2['bananas']:
l.extend([x * y - z])
The length of your resultant list will be 3 times the length of your DataFrame. Hence the error. To fix, you need the equivalent of:
for x, y, z in zip(df2['apples'], df2['oranges'], df2['bananas']):
l.extend([x * y - z])
In terms of list comprehension:
[x * y - z for x, y, z in zip(df2['apples'], df2['oranges'], df2['bananas'])]
The reason why your new method doesn't work is because the list comprehension produces data that is longer than the number of indices in your dataframe. A quick fix for that would be something like:
[x * y - z for x,y,z in zip(df2['apples'], df2['oranges'], df2['bananas'])]
You can get all the values of the row as a list using the np.array()
function inside your list of comprehension.
The following code solves your problem:
df2['optimal_fruit'] = [x[0] * x[1] - x[2] for x in np.array(df2)]
It is going to avoid the need of typing each column name in your list of comprehension.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With