Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to keep only a certain set of rows by index in a pandas DataFrame

I have a DataFrame I created by doing the following manipulations to a .fits file:

data_dict= dict()
for obj in sortedpab:
    for key in ['FIELD', 'ID',  'RA' , 'DEC' , 'Z_50', 'Z_84','Z_16' , 'PAB_FLUX', 'PAB_FLUX_ERR']:
        data_dict.setdefault(key, list()).append(obj[key])

gooddf = pd.DataFrame(data_dict)
gooddf['Z_ERR']= ((gooddf['Z_84'] - gooddf['Z_50']) + (gooddf['Z_50'] - gooddf['Z_16'])) / (2 * 
gooddf['Z_50'])
gooddf['OBS_PAB'] = 12820 * (1 + gooddf['Z_50'])
gooddf.loc[gooddf['FIELD'] == "ERS" , 'FIELD'] = "ERSPRIME"
gooddf = gooddf[['FIELD' , 'ID' , 'RA' , 'DEC' , 'Z_50' , 'Z_ERR' , 'PAB_FLUX' , 'PAB_FLUX_ERR' , 
'OBS_PAB']]
gooddf = gooddf[gooddf.OBS_PAB <= 16500]

Which gives me a DataFrame with 351 rows and 9 columns. I would like to keep rows only according to certain indices, and I thought for example doing something of this sort:

indices = [5 , 6 , 9 , 10]
gooddf = gooddf[gooddf.index == indices]

where I would like it to keep only the rows with the index values listed in the array indices, but this is giving me issues.

I found a way to do this with a for loop:

good = np.array([5 , 6 , 9 , 12 , 14 , 15 , 18 , 21 , 24 , 29 , 30 , 35 , 36 , 37 , 46 , 48 ])

gooddf50 = pd.DataFrame()
for i in range(len(good)):
    gooddf50 = gooddf50.append(gooddf[gooddf.index == good[i]])

Any thoughts on how to do this in a better way, preferably using just pandas?

like image 742
Nikko Cleri Avatar asked Oct 21 '19 20:10

Nikko Cleri


People also ask

How do you slice rows in Pandas?

Slicing Rows and Columns by Index PositionWhen slicing by index position in Pandas, the start index is included in the output, but the stop index is one step beyond the row you want to select. So the slice return row 0 and row 1, but does not return row 2. The second slice [:] indicates that all columns are required.

How do you select rows of Pandas DataFrame based on values in a list?

isin() to Select Rows From List of Values. DataFrame. isin() method is used to filter/select rows from a list of values. You can have the list of values in variable and use it on isin() or use it directly.


1 Answers

This will do the trick:

gooddf.loc[indices]

An important note: .iloc and .loc are doing slightly different things, which is why you may be getting unexpected results.

You can read deeper into the details of indexing here, but the key thing to understand is that .iloc returns rows according to the positions specified, whereas .loc returns rows according to the index labels specified. So if your indices aren't sorted, .loc and .iloc will behave differently.

like image 195
Carolyn Conway Avatar answered Oct 16 '22 13:10

Carolyn Conway