Selecting 1.6M rows of a pandas dataframe [duplicate]

Question

I have a csv file with ~2.3M rows. I'd like save the subset (~1.6M) of the rows which have non-nan values in two columns inside the dataframe. I'd like to keep using pandas to do this. Right now, my code looks like:

import pandas as pd
catalog = pd.read_csv('catalog.txt')
slim_list = []
for i in range(len(catalog)):
    if (pd.isna(catalog['z'][i]) == False and pd.isna(catalog['B'][i]) == False):
        slim_list.append(i)

which holds the rows of catalog which have non-nan values. I then make a new catalog with those rows as entries

slim_catalog = pd.DataFrame(columns = catalog.columns)
for j in range(len(slim_list)):
    data = (catalog.iloc[j]).to_dict()
    slim_catalog = slim_catalog.append(data, ignore_index = True)
pd.to_csv('slim_catalog.csv')

This should, in principle, work. It's sped up a little by reading each row into a dict. However, it takes far, far too long to execute for all 2.3M rows. What is a better way to solve this problem?

juanpa.arrivillaga · Accepted Answer

This is the completely wrong way of doing this in pandas.

Firstly, never iterate over some range, i.e. for i in range(len(catalog)): and then individually index into the row: catalog['z'][i], that is incredibly inefficient.

Second, do not create a pandas.DataFrame using pd.DataFrame.append in a loop, that is a linear operation, so the entire thing will be quadratic time.

But you shouldn't be looping here to begin with. All you need is something like

catalog[catalog.loc[:, ['z', 'B']].notna().all(axis=1)].to_csv('slim_catalog.csv')

Or broken up to perhaps be more readable:

not_nan_zB = catalog.loc[:, ['z', 'B']].notna().all(axis=1)
catalog[not_nan_zB].to_csv('slim_catalog.csv')

Selecting 1.6M rows of a pandas dataframe [duplicate]

Tags:

python

pandas

dataframe

user3517167

1 Answers

juanpa.arrivillaga

Recent Activity

Donate For Us

Selecting 1.6M rows of a pandas dataframe [duplicate]

Tags:

python

pandas

dataframe

user3517167

1 Answers

juanpa.arrivillaga

Related questions

Recent Activity

Donate For Us