subsetting a Python DataFrame

Tags:

I am transitioning from R to Python. I just began using Pandas. I have an R code that subsets nicely:

k1 <- subset(data, Product = p.id & Month < mn & Year == yr, select = c(Time, Product))

Now, I want to do similar stuff in Python. this is what I have got so far:

import pandas as pd data = pd.read_csv("../data/monthly_prod_sales.csv")   #first, index the dataset by Product. And, get all that matches a given 'p.id' and time.  data.set_index('Product')  k = data.ix[[p.id, 'Time']]  # then, index this subset with Time and do more subsetting..

I am beginning to feel that I am doing this the wrong way. perhaps, there is an elegant solution. Can anyone help? I need to extract month and year from the timestamp I have and do subsetting. Perhaps there is a one-liner that will accomplish all this:

k1 <- subset(data, Product = p.id & Time >= start_time & Time < end_time, select = c(Time, Product))

thanks.

432

asked Oct 08 '13 02:10

user1717931

1 Answers

I'll assume that Time and Product are columns in a DataFrame, df is an instance of DataFrame, and that other variables are scalar values:

For now, you'll have to reference the DataFrame instance:

k1 = df.loc[(df.Product == p_id) & (df.Time >= start_time) & (df.Time < end_time), ['Time', 'Product']]

The parentheses are also necessary, because of the precedence of the & operator vs. the comparison operators. The & operator is actually an overloaded bitwise operator which has the same precedence as arithmetic operators which in turn have a higher precedence than comparison operators.

In pandas 0.13 a new experimental DataFrame.query() method will be available. It's extremely similar to subset modulo the select argument:

With query() you'd do it like this:

df[['Time', 'Product']].query('Product == p_id and Month < mn and Year == yr')

Here's a simple example:

In [9]: df = DataFrame({'gender': np.random.choice(['m', 'f'], size=10), 'price': poisson(100, size=10)})  In [10]: df Out[10]:   gender  price 0      m     89 1      f    123 2      f    100 3      m    104 4      m     98 5      m    103 6      f    100 7      f    109 8      f     95 9      m     87  In [11]: df.query('gender == "m" and price < 100') Out[11]:   gender  price 0      m     89 4      m     98 9      m     87

The final query that you're interested will even be able to take advantage of chained comparisons, like this:

k1 = df[['Time', 'Product']].query('Product == p_id and start_time <= Time < end_time')

107

answered Sep 23 '22 02:09

Phillip Cloud

Related questions
                            
                                Applying a function to values in dict
                            
                                Renaming Column Names in Pandas Groupby function [duplicate]
                            
                                Maximum allowed value for a numpy data type
                            
                                Any reason NOT to always use keyword arguments?
                            
                                Handling urllib2's timeout? - Python
                            
                                Get available modules
                            
                                How do I get the name of the rows from the index of a data frame?
                            
                                Get path of virtual environment in pipenv
                            
                                Python class @property: use setter but evade getter?
                            
                                Is it possible to compile a program written in Python? [closed]
                            
                                Django: How should I store a money value?
                            
                                Matplotlib subplots_adjust hspace so titles and xlabels don't overlap?
                            
                                Django Deprecation Warning or ImproperlyConfigured error - Passing a 3-tuple to django.conf.urls.include() is not supported
                            
                                How to import keras from tf.keras in Tensorflow?
                            
                                How to write Python code that is able to properly require a minimal python version?
                            
                                how to get the last part of a string before a certain character?
                            
                                Python sharing a lock between processes
                            
                                Is `id` a keyword in python?
                            
                                Python configuration file: Any file format recommendation? INI format still appropriate? Seems quite old school
                            
                                List Directories and get the name of the Directory

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

subsetting a Python DataFrame

Tags:

python

pandas

subset

user1717931

People also ask

1 Answers

Phillip Cloud

Recent Activity

Donate For Us