In pandas 20.1, with the interval type, is it possible to find the midpoint, left or center values in a series. For example: <ol> <li> Create an interval datatype column, and perform some aggregation calculations over these intervals: <pre class="prettyprint"><code>df_Stats = df.groupby(['month',pd.cut(df['Distances'], np.arange(0, 135,1))]).agg(aggregations) </code></pre> </li> </ol> This returns df_Stats with an interval column datatype : <code>df['Distances']</code> <ol start="2"> <li> Now I want to associate the left end of the interval to the result of these aggregations using a series function: <pre class="prettyprint"><code>df['LeftEnd'] = df['Distances'].left </code></pre> </li> </ol> However, I can run this element wise: <pre class="prettyprint"><code> df.loc[0]['LeftEnd'] = df.loc[0]['Distances'].left </code></pre> This works. Thoughts?

So <code>pd.cut()</code> actually creates a <code>CategoricalIndex</code>, with an <code>IntervalIndex</code> as the categories. <pre class="prettyprint"><code>In [13]: df = pd.DataFrame({'month': [1, 1, 2, 2], 'distances': range(4), 'value': range(4)}) In [14]: df Out[14]: distances month value 0 0 1 0 1 1 1 1 2 2 2 2 3 3 2 3 In [15]: result = df.groupby(['month', pd.cut(df.distances, 2)]).value.mean() In [16]: result Out[16]: month distances 1 (-0.003, 1.5] 0.5 2 (1.5, 3.0] 2.5 Name: value, dtype: float64 </code></pre> You can simply coerce them to an <code>IntervalIndex</code> (this also works if they are a column), then access. <pre class="prettyprint"><code>In [17]: pd.IntervalIndex(result.index.get_level_values('distances')).left Out[17]: Float64Index([-0.003, 1.5], dtype='float64') In [18]: pd.IntervalIndex(result.index.get_level_values('distances')).right Out[18]: Float64Index([1.5, 3.0], dtype='float64') In [19]: pd.IntervalIndex(result.index.get_level_values('distances')).mid Out[19]: Float64Index([0.7485, 2.25], dtype='float64') </code></pre>

Say 'cut' is the column name after performing pd.cut. instead of -> <pre class="prettyprint"><code> df['LeftEnd'] = df['Distances'].left </code></pre> perform one of the following --> <pre class="prettyprint"><code> df['LeftEnd'] = df['cut'].apply(lambda x: x.left) df['LeftEnd'] = df['cut'].apply(lambda x: x.left).astype(str) </code></pre>

Interval datatype in Pandas - find midpoint, left, center etc

In pandas 20.1, with the interval type, is it possible to find the midpoint, left or center values in a series.

For example:

Create an interval datatype column, and perform some aggregation calculations over these intervals:
```
df_Stats = df.groupby(['month',pd.cut(df['Distances'], np.arange(0, 135,1))]).agg(aggregations)
```

This returns df_Stats with an interval column datatype : df['Distances']

Now I want to associate the left end of the interval to the result of these aggregations using a series function:
```
df['LeftEnd'] = df['Distances'].left
```

However, I can run this element wise:

    df.loc[0]['LeftEnd'] = df.loc[0]['Distances'].left

This works. Thoughts?

What is interval data in Python?

Defined interval data as a quantitative data type that groups variables into ranked categories, using continuous numerical values. Explained the difference between interval and ratio data: Both are types of numerical data. However, interval data lacks a true zero, whereas ratio data does not.

How to handle time series data in pandas?

Using pandas datetime properties. Initially, the values in datetime are character strings and do not provide any datetime operations (e.g. extract the year, day of the week,…). By applying the to_datetime function, pandas interprets the strings and convert these to datetime (i.e. datetime64[ns, UTC] ) objects.

So pd.cut() actually creates a CategoricalIndex, with an IntervalIndex as the categories.

In [13]: df = pd.DataFrame({'month': [1, 1, 2, 2], 'distances': range(4), 'value': range(4)})

In [14]: df
Out[14]: 
   distances  month  value
0          0      1      0
1          1      1      1
2          2      2      2
3          3      2      3

In [15]: result = df.groupby(['month', pd.cut(df.distances, 2)]).value.mean()

In [16]: result
Out[16]: 
month  distances    
1      (-0.003, 1.5]    0.5
2      (1.5, 3.0]       2.5
Name: value, dtype: float64

You can simply coerce them to an IntervalIndex (this also works if they are a column), then access.

In [17]: pd.IntervalIndex(result.index.get_level_values('distances')).left
Out[17]: Float64Index([-0.003, 1.5], dtype='float64')

In [18]: pd.IntervalIndex(result.index.get_level_values('distances')).right
Out[18]: Float64Index([1.5, 3.0], dtype='float64')

In [19]: pd.IntervalIndex(result.index.get_level_values('distances')).mid
Out[19]: Float64Index([0.7485, 2.25], dtype='float64')

Say 'cut' is the column name after performing pd.cut.

instead of ->

 df['LeftEnd'] = df['Distances'].left

perform one of the following -->

 df['LeftEnd'] = df['cut'].apply(lambda x: x.left)

 df['LeftEnd'] = df['cut'].apply(lambda x: x.left).astype(str)

Interval datatype in Pandas - find midpoint, left, center etc

Tags:

python

pandas

intervals

penguin

People also ask

2 Answers

Jeff

Mahesh Babu J

Recent Activity

Donate For Us

Interval datatype in Pandas - find midpoint, left, center etc

Tags:

python

pandas

intervals

penguin

People also ask

2 Answers

Jeff

Mahesh Babu J

Related questions

Recent Activity

Donate For Us