Pandas Equivalent of R's which()

Tags:

Variations of this question have been asked before, I'm still having trouble understanding how to actually slice a python series/pandas dataframe based on conditions that I'd like to set.

In R, what I'm trying to do is:

df[which(df[,colnumber] > somenumberIchoose),]

The which() function finds indices of row entries in a column in the dataframe which are greater than somenumberIchoose, and returns this as a vector. Then, I slice the dataframe by using these row indices to indicate which rows of the dataframe I would like to look at in the new form.

Is there an equivalent way to do this in python? I've seen references to enumerate, which I don't fully understand after reading the documentation. My sample in order to get the row indices right now looks like this:

indexfuture = [ x.index(), x in enumerate(df['colname']) if x > yesterday]

However, I keep on getting an invalid syntax error. I can hack a workaround by for looping through the values, and manually doing the search myself, but that seems extremely non-pythonic and inefficient.

What exactly does enumerate() do? What is the pythonic way of finding indices of values in a vector that fulfill desired parameters?

Note: I'm using Pandas for the dataframes

276

asked Aug 01 '14 18:08

ding

2 Answers

I may not understand clearly the question, but it looks like the response is easier than what you think:

using pandas DataFrame:

df['colname'] > somenumberIchoose

returns a pandas series with True / False values and the original index of the DataFrame.

Then you can use that boolean series on the original DataFrame and get the subset you are looking for:

df[df['colname'] > somenumberIchoose]

should be enough.

See http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing

answered Sep 16 '22 18:09

fdeheeger

What what I know of R you might be more comfortable working with numpy -- a scientific computing package similar to MATLAB.

If you want the indices of an array who values are divisible by two then the following would work.

arr = numpy.arange(10)
truth_table = arr % 2 == 0
indices = numpy.where(truth_table)
values = arr[indices]

It's also easy to work with multi-dimensional arrays

arr2d = arr.reshape(2,5)
col_indices = numpy.where(arr2d[col_index] % 2 == 0)
col_values = arr2d[col_index, col_indices]

answered Sep 17 '22 18:09

Dunes

Related questions
                            
                                IP address by Domain Name
                            
                                Don't parse options after the last positional argument
                            
                                psycopg - Get formatted sql instead of executing
                            
                                How do I import a module from a parent directory? (unittest purposes)
                            
                                Construct a tree from list os file paths (Python) - Performance dependent
                            
                                How to set the margins for a matplotlib figure?
                            
                                Implementing python slice notation
                            
                                How to extract a JSON object that was defined in a HTML page javascript block using Python?
                            
                                How do I POST with jQuery/Ajax in Django?
                            
                                wtforms hidden field value
                            
                                Is it possible to np.concatenate memory-mapped files?
                            
                                Python: Get unbound class method
                            
                                Trigger an event when clipboard content changes
                            
                                How do you improve matplotlib image quality?
                            
                                scipy ImportError on travis-ci
                            
                                Why am I getting a FileNotFoundError?
                            
                                Wrapping arrays in Boost Python
                            
                                IPython Notebook Tab-Complete -- Show Docstring
                            
                                Do Python strings end in a terminating NULL?
                            
                                Why does str(KeyError) add extra quotes?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas Equivalent of R's which()

Tags:

python

pandas

logical-operators

ding

People also ask

2 Answers

fdeheeger

Dunes

Recent Activity

Donate For Us