Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

list comprehension in pandas

I'm giving a toy example but it will help me understand what's going on for something else I'm trying to do. Let's say I want a new column in a dataframe 'optimal_fruit' that is apples * orange - bananas.

I can do something like this to get it.

df2['optimal_fruit'] = df2['apples'] * df2['oranges'] - df2['bananas'] 


apples  oranges bananas optimal_fruit
1       6       11      -5
2       7       12      2
3       8       13      11
4       9       14      22
5       10      15      35

What is happening if I try to do something like this? And how could I do this in a list comprehension?

df2['optimal_fruit'] = [x * y - z for x in df2['apples'] for y in df2['oranges'] for z in df2['bananas']]

I get an error of:

ValueError: Length of values does not match length of index

As always, thank you all so much for your help!

like image 364
WhitneyChia Avatar asked Nov 17 '16 03:11

WhitneyChia


People also ask

Can you do list comprehension with a Dataframe?

A dataframe is two-dimensional data structure with rows and columns. A list comprehension is a shorthand syntax for creating new lists based on existing lists. They are one of several methods that are available in Python to accomplish this.

What is list comprehension method?

List comprehension is an elegant way to define and create lists based on existing lists. List comprehension is generally more compact and faster than normal functions and loops for creating list. However, we should avoid writing very long list comprehensions in one line to ensure that code is user-friendly.

What is list comprehension give an example?

List comprehension offers a shorter syntax when you want to create a new list based on the values of an existing list. Example: Based on a list of fruits, you want a new list, containing only the fruits with the letter "a" in the name.


Video Answer


3 Answers

Essentially your list comprehension statement is a set of 3 nested loops. In code:

l = []
for x in df2['apples']:
    for y in df2['oranges']:
        for z in df2['bananas']:
            l.extend([x * y - z])

The length of your resultant list will be 3 times the length of your DataFrame. Hence the error. To fix, you need the equivalent of:

for x, y, z in zip(df2['apples'], df2['oranges'], df2['bananas']):
    l.extend([x * y - z])

In terms of list comprehension:

[x * y - z for x, y, z in zip(df2['apples'], df2['oranges'], df2['bananas'])]
like image 65
Kartik Avatar answered Nov 15 '22 19:11

Kartik


The reason why your new method doesn't work is because the list comprehension produces data that is longer than the number of indices in your dataframe. A quick fix for that would be something like:

[x * y - z for x,y,z in zip(df2['apples'], df2['oranges'], df2['bananas'])]
like image 35
jtitusj Avatar answered Nov 15 '22 20:11

jtitusj


You can get all the values of the row as a list using the np.array() function inside your list of comprehension.

The following code solves your problem:

df2['optimal_fruit'] = [x[0] * x[1] - x[2] for x in np.array(df2)]

It is going to avoid the need of typing each column name in your list of comprehension.

like image 33
Tiago Pitchon Avatar answered Nov 15 '22 20:11

Tiago Pitchon