In pandas 20.1, with the interval type, is it possible to find the midpoint, left or center values in a series.
For example:
Create an interval datatype column, and perform some aggregation calculations over these intervals:
df_Stats = df.groupby(['month',pd.cut(df['Distances'], np.arange(0, 135,1))]).agg(aggregations)
This returns df_Stats with an interval column datatype : df['Distances']
Now I want to associate the left end of the interval to the result of these aggregations using a series function:
df['LeftEnd'] = df['Distances'].left
However, I can run this element wise:
df.loc[0]['LeftEnd'] = df.loc[0]['Distances'].left
This works. Thoughts?
Defined interval data as a quantitative data type that groups variables into ranked categories, using continuous numerical values. Explained the difference between interval and ratio data: Both are types of numerical data. However, interval data lacks a true zero, whereas ratio data does not.
Using pandas datetime properties. Initially, the values in datetime are character strings and do not provide any datetime operations (e.g. extract the year, day of the week,…). By applying the to_datetime function, pandas interprets the strings and convert these to datetime (i.e. datetime64[ns, UTC] ) objects.
So pd.cut()
actually creates a CategoricalIndex
, with an IntervalIndex
as the categories.
In [13]: df = pd.DataFrame({'month': [1, 1, 2, 2], 'distances': range(4), 'value': range(4)})
In [14]: df
Out[14]:
distances month value
0 0 1 0
1 1 1 1
2 2 2 2
3 3 2 3
In [15]: result = df.groupby(['month', pd.cut(df.distances, 2)]).value.mean()
In [16]: result
Out[16]:
month distances
1 (-0.003, 1.5] 0.5
2 (1.5, 3.0] 2.5
Name: value, dtype: float64
You can simply coerce them to an IntervalIndex
(this also works if they are a column), then access.
In [17]: pd.IntervalIndex(result.index.get_level_values('distances')).left
Out[17]: Float64Index([-0.003, 1.5], dtype='float64')
In [18]: pd.IntervalIndex(result.index.get_level_values('distances')).right
Out[18]: Float64Index([1.5, 3.0], dtype='float64')
In [19]: pd.IntervalIndex(result.index.get_level_values('distances')).mid
Out[19]: Float64Index([0.7485, 2.25], dtype='float64')
Say 'cut' is the column name after performing pd.cut.
instead of ->
df['LeftEnd'] = df['Distances'].left
perform one of the following -->
df['LeftEnd'] = df['cut'].apply(lambda x: x.left)
df['LeftEnd'] = df['cut'].apply(lambda x: x.left).astype(str)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With