I'm giving a toy example but it will help me understand what's going on for something else I'm trying to do. Let's say I want a new column in a dataframe 'optimal_fruit' that is apples * orange - bananas. I can do something like this to get it. <pre class="prettyprint"><code>df2['optimal_fruit'] = df2['apples'] * df2['oranges'] - df2['bananas'] apples oranges bananas optimal_fruit 1 6 11 -5 2 7 12 2 3 8 13 11 4 9 14 22 5 10 15 35 </code></pre> What is happening if I try to do something like this? And how could I do this in a list comprehension? <pre class="prettyprint"><code>df2['optimal_fruit'] = [x * y - z for x in df2['apples'] for y in df2['oranges'] for z in df2['bananas']] </code></pre> I get an error of: ValueError: Length of values does not match length of index As always, thank you all so much for your help!

You can get all the values of the row as a list using the <code>np.array()</code> function inside your list of comprehension. The following code solves your problem: <pre class="prettyprint"><code>df2['optimal_fruit'] = [x[0] * x[1] - x[2] for x in np.array(df2)] </code></pre> It is going to avoid the need of typing each column name in your list of comprehension.

list comprehension in pandas

Tags:

python

pandas

list-comprehension

I'm giving a toy example but it will help me understand what's going on for something else I'm trying to do. Let's say I want a new column in a dataframe 'optimal_fruit' that is apples * orange - bananas.

I can do something like this to get it.

df2['optimal_fruit'] = df2['apples'] * df2['oranges'] - df2['bananas'] 


apples  oranges bananas optimal_fruit
1       6       11      -5
2       7       12      2
3       8       13      11
4       9       14      22
5       10      15      35

What is happening if I try to do something like this? And how could I do this in a list comprehension?

df2['optimal_fruit'] = [x * y - z for x in df2['apples'] for y in df2['oranges'] for z in df2['bananas']]

I get an error of:

ValueError: Length of values does not match length of index

As always, thank you all so much for your help!

364

asked Nov 17 '16 03:11

WhitneyChia

Video Answer

3 Answers

Essentially your list comprehension statement is a set of 3 nested loops. In code:

l = []
for x in df2['apples']:
    for y in df2['oranges']:
        for z in df2['bananas']:
            l.extend([x * y - z])

The length of your resultant list will be 3 times the length of your DataFrame. Hence the error. To fix, you need the equivalent of:

for x, y, z in zip(df2['apples'], df2['oranges'], df2['bananas']):
    l.extend([x * y - z])

In terms of list comprehension:

[x * y - z for x, y, z in zip(df2['apples'], df2['oranges'], df2['bananas'])]

answered Nov 15 '22 19:11

Kartik

The reason why your new method doesn't work is because the list comprehension produces data that is longer than the number of indices in your dataframe. A quick fix for that would be something like:

[x * y - z for x,y,z in zip(df2['apples'], df2['oranges'], df2['bananas'])]

answered Nov 15 '22 20:11

jtitusj

You can get all the values of the row as a list using the np.array() function inside your list of comprehension.

The following code solves your problem:

df2['optimal_fruit'] = [x[0] * x[1] - x[2] for x in np.array(df2)]

It is going to avoid the need of typing each column name in your list of comprehension.

answered Nov 15 '22 20:11

Tiago Pitchon

Related questions
                            
                                How to convert bytearray with non-ASCII bytes to string in python?
                            
                                max() give "int" not callable error in my function
                            
                                Read file as a list of tuples
                            
                                Python - What are the major improvement of Pandas over Numpy/Scipy
                            
                                ValueError: '10.0.0.0/24' does not appear to be an IPv4 or IPv6 network
                            
                                Python docstrings and inline code; meaning of the ">>>" syntax
                            
                                how to extract days as integers from a timedelta64[ns] object in python
                            
                                How to strip newline from shell command's standard output run via ansible
                            
                                "ImportError: No module named..." when importing my own module
                            
                                Paramiko / scp - check if file exists on remote host
                            
                                Get serializer field value in api-view
                            
                                Datetime strptime in Python pandas : what's wrong?
                            
                                Importing Numpy results in error even though Anaconda says it's installed?
                            
                                Efficient Double Sum of Products
                            
                                python find string pattern in numpy array of strings
                            
                                how to open chrome in incognito mode from Python
                            
                                Extracting key value pairs from string with quotes
                            
                                How to install Python 3.5 on Raspbian Jessie
                            
                                Anaconda Python virtualdev can't find libpython3.5m.so.1.0 on Windows Subsystem for Linux (Ubuntu 14.04)
                            
                                Repeat list to max number of elements [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With