I am working with survey data loaded from an h5-file as hdf = pandas.HDFStore('Survey.h5')
through the pandas package. Within this DataFrame
, all rows are the results of a single survey, whereas the columns are the answers for all questions within a single survey.
I am aiming to reduce this dataset to a smaller DataFrame
including only the rows with a certain depicted answer on a certain question, i.e. with all the same value in this column. I am able to determine the index values of all rows with this condition, but I can't find how to delete this rows or make a new df with these rows only.
Slicing a DataFrame in Pandas includes the following steps:Ensure Python is installed (or install ActivePython) Import a dataset. Create a DataFrame. Slice the DataFrame.
To slice the columns, the syntax is df. loc[:,start:stop:step] ; where start is the name of the first column to take, stop is the name of the last column to take, and step as the number of indices to advance after each extraction; for example, you can select alternate columns.
iloc method. For now, we explain the semantics of slicing using the [] operator. With DataFrame, slicing inside of [] slices the rows. This is provided largely as a convenience since it is such a common operation.
In [36]: df Out[36]: A B C D a 0 2 6 0 b 6 1 5 2 c 0 2 6 0 d 9 3 2 2 In [37]: rows Out[37]: ['a', 'c'] In [38]: df.drop(rows) Out[38]: A B C D b 6 1 5 2 d 9 3 2 2 In [39]: df[~((df.A == 0) & (df.B == 2) & (df.C == 6) & (df.D == 0))] Out[39]: A B C D b 6 1 5 2 d 9 3 2 2 In [40]: df.ix[rows] Out[40]: A B C D a 0 2 6 0 c 0 2 6 0 In [41]: df[((df.A == 0) & (df.B == 2) & (df.C == 6) & (df.D == 0))] Out[41]: A B C D a 0 2 6 0 c 0 2 6 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With