List comprehension works but not for loop––why?

Tags:

I'm a bit annoyed with myself because I can't understand why one solution to a problem worked but another didn't. As in, it points to a deficient understanding of (basic) pandas on my part, and that makes me mad!

Anyway, my problem was simple: I had a list of 'bad' values ('bad_index'); these corresponded to row indexes on a dataframe ('data_clean1') for which I wanted to delete the corresponding rows. However, as the values will change with each new dataset, I didn't want to plug the bad values directly into the code. Here's what I did first:

bad_index = [2, 7, 8, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 29]

for i in bad_index:
    dataclean2 = dataclean1.drop([i]).reset_index(level = 0, drop = True)

But this didn't work; the data_clean2 remained the exact same as data_clean1. My second idea was to use list comprehensions (as below); this worked out fine.

bad_index = [2, 7, 8, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 29]

data_clean2 = data_clean1.drop([x for x in bad_index]).reset_index(level = 0, drop = True)

Now, why did the list comprehension method work and not the 'for' loop? I've been coding for a few months, and I feel that I shouldn't be making these kinds of errors.

Thanks!

379

asked Aug 19 '16 17:08

Lodore66

2 Answers

data_clean1.drop([x for x in bad_index]).reset_index(level = 0, drop = True) is equivalent to simply passing the bad_index list to drop:

data_clean1.drop(bad_index).reset_index(level = 0, drop = True)

drop accepts a list, and drops every index present in the list.

Your explicit for loop didn't work because in every iteration you simply dropped a different index from the dataclean1 dataframe without saving the intermediate dataframes, so by the last iteration dataclean2 was simply the result of executing
dataclean2 = dataclean1.drop(29).reset_index(level = 0, drop = True)

answered Nov 18 '22 01:11

DeepSpace

EDIT: it turns out this is not your problem ... but if you did not have the problem mentioned in the other answer by Deepspace then you would have this problem

for i in bad_index:
    dataclean2 = dataclean1.drop([i]).reset_index(level = 0, drop = True)

imagine your bad index is [1,2,3] and your dataclean is [4,5,6,7,8]

now lets step through what actually happens

initial: dataclean == [4,5,6,7,8]

loop0 : i == 1 => drop index 1 ==>dataclean = [4,6,7,8]

loop1 : i == 2 => drop index 2 ==> dataclean = [4,6,8]

loop2 : i ==3 ==> drop index 3 !!!! uh oh there is no index 3

you could i guess do instead

for i in reversed(bad_index):
    ...

this way if you remove index3 first it will not affect index 1 and 2

but in general you should not mutate a list/dict as you iterate over it

answered Nov 18 '22 03:11

Joran Beasley

Related questions
                            
                                how to write setup.py to install python extention (xxx.so file) built by SWIG?
                            
                                itertools product should not contain combination having duplicate values
                            
                                How to load SVM data from file in OpenCV 3.1?
                            
                                Showing all index values when using multiIndexing in Pandas
                            
                                PyCharm and f-strings
                            
                                How to use __getattr__ to delegate methods to attribute?
                            
                                Test isolation broken with multiple databases in Django. How to fix it?
                            
                                Splitting duplicates into separate table - Pandas
                            
                                default() method in Python
                            
                                Getting all attributes to appear on python's `__dict__` method
                            
                                how to find the index for a quantile
                            
                                How to center text horizontally in a Kivy text input?
                            
                                Image to text python
                            
                                Is `if x:` completely equivalent to `if bool(x) is True:`?
                            
                                Named string format arguments in Python
                            
                                How to filter data from a data frame when the number of columns are dynamic?
                            
                                How can I capture a key press (key logging) in Linux?
                            
                                what are the differences between import and extends in Flask?
                            
                                Execute flask-SQLAlchemy subquery
                            
                                How to put a JSON file's content in a response

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

List comprehension works but not for loop––why?

Tags:

python

indexing

for-loop

pandas

list-comprehension

Lodore66

People also ask

2 Answers

DeepSpace

Joran Beasley

Recent Activity

Donate For Us