split-apply-combine on pandas timedelta column

Q: How to split a string column in a pandas Dataframe?

You can use the following basic syntax to split a string column in a pandas DataFrame into multiple columns: #split column A into two columns: column A and column B df[[' A ', ' B ']] = df[' A ']. str. split (', ', 1, expand= True) The following examples show how to use this syntax in practice. Example 1: Split Column by Comma

Q: What is split-apply-combine in pandas?

Master the Split-Apply-Combine pattern in Python with this visual guide to Pandas groupby-apply. Pandas groupby-apply is an invaluable tool in a Python data scientist’s toolkit. You can go pretty far with it without fully understanding all of its internal intricacies. However, sometimes that can manifest itself in unexpected behavior and errors.

Q: What is timedelta in pandas?

pandas.Timedelta. ¶. Represents a duration, the difference between two dates or times. Timedelta is the pandas equivalent of python’s datetime.timedelta and is interchangeable with it in most cases. Denote the unit of the input, if input is an integer. ‘nanoseconds’, ‘nanosecond’, ‘nanos’, ‘nano’, or ‘ns’.

Q: What is the best way to split data into groups?

Applying a function to each group independently. Combining the results into a data structure. Out of these, the split step is the most straightforward. In fact, in many situations we may wish to split the data set into groups and do something with those groups.

Tags:

python

pandas

I have a DataFrame with a column of timedeltas (actually upon inspection the dtype is timedelta64[ns] or <m8[ns]), and I'd like to do a split-combine-apply, but the timedelta column is being dropped:

import pandas as pd

import numpy as np

pd.__version__
Out[3]: '0.13.0rc1'

np.__version__
Out[4]: '1.8.0'

data = pd.DataFrame(np.random.rand(10, 3), columns=['f1', 'f2', 'td'])

data['td'] *= 10000000

data['td'] = pd.Series(data['td'], dtype='<m8[ns]')

data
Out[8]: 
         f1        f2              td
0  0.990140  0.948313 00:00:00.003066
1  0.277125  0.993549 00:00:00.001443
2  0.016427  0.581129 00:00:00.009257
3  0.048662  0.512215 00:00:00.000702
4  0.846301  0.179160 00:00:00.000396
5  0.568323  0.419887 00:00:00.000266
6  0.328182  0.919897 00:00:00.006138
7  0.292882  0.213219 00:00:00.008876
8  0.623332  0.003409 00:00:00.000322
9  0.650436  0.844180 00:00:00.006873

[10 rows x 3 columns]

data.groupby(data.index < 5).mean()
Out[9]: 
             f1        f2
False  0.492631  0.480118
True   0.435731  0.642873

[2 rows x 2 columns]

Or, forcing pandas to try the operation on the 'td' column:

data.groupby(data.index < 5)['td'].mean()
---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-12-88cc94e534b7> in <module>()
----> 1 data.groupby(data.index < 5)['td'].mean()

/path/to/lib/python3.3/site-packages/pandas-0.13.0rc1-py3.3-linux-x86_64.egg/pandas/core/groupby.py in mean(self)
    417         """
    418         try:
--> 419             return self._cython_agg_general('mean')
    420         except GroupByError:
    421             raise

/path/to/lib/python3.3/site-packages/pandas-0.13.0rc1-py3.3-linux-x86_64.egg/pandas/core/groupby.py in _cython_agg_general(self, how, numeric_only)
    669 
    670         if len(output) == 0:
--> 671             raise DataError('No numeric types to aggregate')
    672 
    673         return self._wrap_aggregated_output(output, names)

DataError: No numeric types to aggregate

However, taking the mean of the column works fine, so numeric operations should be possible:

data['td'].mean()
Out[11]: 
0   00:00:00.003734
dtype: timedelta64[ns]

Obviously it's easy enough to coerce to float before doing the groupby, but I figured I might as well try to understand what I'm running into.

Edit: See https://github.com/pydata/pandas/issues/5724

703

asked Dec 17 '13 04:12

ontologist

1 Answers

Turns out this is a pandas issue, this behavior needs to be implemented in groupby.py.

In the meantime, please enjoy this workaround that casts to float (units of seconds):

data['td'] = [10**-9 * float(td) for td in data['td']]

answered Oct 19 '22 19:10

ontologist

Related questions
                            
                                Python region folding syntax
                            
                                Do simple things with a Google Wave robot
                            
                                How do I verify an SSL certificate in python?
                            
                                is there a way of hooking into a currently running python script to see whats going on?
                            
                                Windows - running .py directly vs running python blah.py behaves differently
                            
                                Is this a safe way to increment and get the value of a counter in Django?
                            
                                Getting an unexpected NameError in pycassaShell when invoking one function from another
                            
                                `pip help install` raise UnicodeDecodeError
                            
                                Extracting gettext strings from Javascript and HTML files (templates)
                            
                                Is there an equivalent of the matlab 'idealfilter' for Python in Scipy (or other libraries)?
                            
                                When I type non-ASCII characters using a Windows keyboard I get "?"
                            
                                numpy fromfile(count = -1) returns array of zeros on Mac OS for huge filesize
                            
                                Stop pyplot.contour from drawing a contour along a discontinuity
                            
                                How to convert H264 RTP stream from PCAP to a playable video file
                            
                                Can't use CNTLM to connect to pip

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

split-apply-combine on pandas timedelta column

Tags:

python

pandas

ontologist

People also ask

1 Answers

ontologist

Recent Activity

Donate For Us