Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List comprehension works but not for loop––why?

I'm a bit annoyed with myself because I can't understand why one solution to a problem worked but another didn't. As in, it points to a deficient understanding of (basic) pandas on my part, and that makes me mad!

Anyway, my problem was simple: I had a list of 'bad' values ('bad_index'); these corresponded to row indexes on a dataframe ('data_clean1') for which I wanted to delete the corresponding rows. However, as the values will change with each new dataset, I didn't want to plug the bad values directly into the code. Here's what I did first:

bad_index = [2, 7, 8, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 29]

for i in bad_index:
    dataclean2 = dataclean1.drop([i]).reset_index(level = 0, drop = True)

But this didn't work; the data_clean2 remained the exact same as data_clean1. My second idea was to use list comprehensions (as below); this worked out fine.

bad_index = [2, 7, 8, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 29]

data_clean2 = data_clean1.drop([x for x in bad_index]).reset_index(level = 0, drop = True)

Now, why did the list comprehension method work and not the 'for' loop? I've been coding for a few months, and I feel that I shouldn't be making these kinds of errors.

Thanks!

like image 379
Lodore66 Avatar asked Aug 19 '16 17:08

Lodore66


People also ask

Why might you use a list comprehension instead of a loop?

List comprehensions are often not only more readable but also faster than using "for loops." They can simplify your code, but if you put too much logic inside, they will instead become harder to read and understand.

Why is list comprehension faster than for loops?

As we can see, the for loop is slower than the list comprehension (9.9 seconds vs. 8.2 seconds). List comprehensions are faster than for loops to create lists. But, this is because we are creating a list by appending new elements to it at each iteration.

What is the difference between list comprehension and for loop?

The for loop is a common way to iterate through a list. List comprehension, on the other hand, is a more efficient way to iterate through a list because it requires fewer lines of code. List comprehension requires less computation power than a for loop because it takes up less space and code.

Is list comprehension better than for loop Python?

Because of differences in how Python implements for loops and list comprehension, list comprehensions are almost always faster than for loops when performing operations. Below, the same operation is performed by list comprehension and by for loop.


2 Answers

data_clean1.drop([x for x in bad_index]).reset_index(level = 0, drop = True) is equivalent to simply passing the bad_index list to drop:

data_clean1.drop(bad_index).reset_index(level = 0, drop = True)

drop accepts a list, and drops every index present in the list.

Your explicit for loop didn't work because in every iteration you simply dropped a different index from the dataclean1 dataframe without saving the intermediate dataframes, so by the last iteration dataclean2 was simply the result of executing
dataclean2 = dataclean1.drop(29).reset_index(level = 0, drop = True)

like image 64
DeepSpace Avatar answered Nov 18 '22 01:11

DeepSpace


EDIT: it turns out this is not your problem ... but if you did not have the problem mentioned in the other answer by Deepspace then you would have this problem

for i in bad_index:
    dataclean2 = dataclean1.drop([i]).reset_index(level = 0, drop = True)

imagine your bad index is [1,2,3] and your dataclean is [4,5,6,7,8]

now lets step through what actually happens

initial: dataclean == [4,5,6,7,8]

loop0 : i == 1 => drop index 1 ==>dataclean = [4,6,7,8]

loop1 : i == 2 => drop index 2 ==> dataclean = [4,6,8]

loop2 : i ==3 ==> drop index 3 !!!! uh oh there is no index 3


you could i guess do instead

for i in reversed(bad_index):
    ...

this way if you remove index3 first it will not affect index 1 and 2

but in general you should not mutate a list/dict as you iterate over it

like image 37
Joran Beasley Avatar answered Nov 18 '22 03:11

Joran Beasley