Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

slicing pandas dataframe on date range

I'm using pandas to analyse financial records.

I have a DataFrame that comes from a csv file that looks like this:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 800 entries, 2010-10-27 00:00:00 to 2011-07-12 00:00:00
Data columns:
debit                      800  non-null values
transaction_type           799  non-null values
transaction_date_raw       800  non-null values
credit                     800  non-null values
transaction_description    800  non-null values
account_number             800  non-null values
sort_code                  800  non-null values
balance                    800  non-null values
dtypes: float64(3), int64(1), object(4)

I am selecting a subset based on transaction amount:

c1 = df['credit'].map(lambda x: x > 1000)
milestones = df[c1].sort()

and want to create slices of the original df based on the dates between the milestones:

delta = dt.timedelta(days=1)
for i in range(len(milestones.index)-1):
        start = milestones.index[i].date()
        end = milestones.index[i+1].date() - delta
        rng = date_range(start, end)

this generates a new series with the dates between my milestones.

<class 'pandas.tseries.index.DatetimeIndex'>
[2010-11-29 00:00:00, ..., 2010-12-30 00:00:00]
Length: 32, Freq: D, Timezone: None

I have followed several approaches to slice my df using these new series (rng) but have failed:

df.ix[start:end] or
df.ix[rng]

this raises: IndexError: invalid slice

df.reindex(rng) or df.reindex(index=rng)

raises: Exception: Reindexing only valid with uniquely valued Index objects

x = [v for v in rng if v in df.index]
df[x]
df.ix[x]
df.index[x]

this also raises invalid slice, and so does:

df.truncate(start, end)

I'm new to pandas, I'm following the early release of the book from Oreilly, and really enjoying it. Any pointers would be appreciated.

like image 488
Gonzalo Avatar asked Jul 06 '12 10:07

Gonzalo


1 Answers

It looks like you've hit a couple of known bugs in non-unique index handling:

https://github.com/pydata/pandas/issues/1201/

https://github.com/pydata/pandas/issues/1587/

A bug fix release is coming out very soon so please check the pandas website or PyPI in a week or so.

Thanks

like image 167
Chang She Avatar answered Oct 23 '22 03:10

Chang She