Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subsetting index from Pandas DataFrame

I have a DataFrame with columns [A, B, C, D, E, F, G, H].

An index has been made with columns [D, G, H]:

>>> print(dgh_columns)
Index(['D', 'G', 'H'], dtype='object')

How can I retrieve the original DataFrame without the columns D, G, H ?

Is there an index subset operation?

Ideally, this would be:

df[df.index - dgh_columns]

But this doesn't seem to work

like image 800
Jivan Avatar asked Nov 07 '16 14:11

Jivan


People also ask

How do you select the index column of a data frame?

Use DataFrame. loc[] and DataFrame. iloc[] to select a single column or multiple columns from pandas DataFrame by column names/label or index position respectively. where loc[] is used with column labels/names and iloc[] is used with column index/position.

How do you remove an index from a data frame?

The most straightforward way to drop a Pandas dataframe index is to use the Pandas . reset_index() method. By default, the method will only reset the index, forcing values from 0 - len(df)-1 as the index. The method will also simply insert the dataframe index into a column in the dataframe.

How do I index a column in a Pandas DataFrame?

In order to set index to column in pandas DataFrame use reset_index() method. By using this you can also set single, multiple indexes to a column. If you are not aware by default, pandas adds an index to each row of the pandas DataFrame.


1 Answers

I think you can use Index.difference:

df[df.columns.difference(dgh_columns)]

Sample:

df = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'C':[7,8,9],
                   'D':[1,3,5],
                   'E':[7,8,9],
                   'F':[1,3,5],
                   'G':[5,3,6],
                   'H':[7,4,3]})

print (df)
   A  B  C  D  E  F  G  H
0  1  4  7  1  7  1  5  7
1  2  5  8  3  8  3  3  4
2  3  6  9  5  9  5  6  3

dgh_columns = pd.Index(['D', 'G', 'H'])
print (df[df.columns.difference(dgh_columns)])
   A  B  C  E  F
0  1  4  7  7  1
1  2  5  8  8  3
2  3  6  9  9  5

Numpy solution with numpy.setxor1d or numpy.setdiff1d:

dgh_columns = pd.Index(['D', 'G', 'H'])
print (df[np.setxor1d(df.columns, dgh_columns)])
   A  B  C  E  F
0  1  4  7  7  1
1  2  5  8  8  3
2  3  6  9  9  5

dgh_columns = pd.Index(['D', 'G', 'H'])
print (df[np.setdiff1d(df.columns, dgh_columns)])
   A  B  C  E  F
0  1  4  7  7  1
1  2  5  8  8  3
2  3  6  9  9  5
like image 176
jezrael Avatar answered Jan 04 '23 00:01

jezrael