I have a Dataframe that stores aging value as below: <pre class="prettyprint"><code>Aging -84 days +11:36:15.000000000 -46 days +12:25:48.000000000 -131 days +20:53:45.000000000 -131 days +22:22:50.000000000 -130 days +01:02:03.000000000 -80 days +17:02:55.000000000 </code></pre> I am trying to extract the text before <code>days</code> in the above column. I tried the below: <pre class="prettyprint"><code>df['new'] = df.Aging.split('days')[0] </code></pre> The above returns <pre class="prettyprint"><code>AttributeError: 'Series' object has no attribute 'split' </code></pre> Expected output: <pre class="prettyprint"><code>-84 -46 -131 -131 -130 -80 </code></pre>

IMO, a better idea would be to convert to <code>timedelta</code> and extract the days component. <pre class="prettyprint"><code>pd.to_timedelta(df.Aging, errors='coerce').dt.days 0 -84 1 -46 2 -131 3 -131 4 -130 5 -80 Name: Aging, dtype: int64 </code></pre> <hr> If you insist on using string methods, you can use <code>str.extract</code>. <pre class="prettyprint"><code>pd.to_numeric( df.Aging.str.extract('(.*?) days', expand=False), errors='coerce') 0 -84 1 -46 2 -131 3 -131 4 -130 5 -80 Name: Aging, dtype: int32 </code></pre> Or, using <code>str.split</code> <pre class="prettyprint"><code>pd.to_numeric(df.Aging.str.split(' days').str[0], errors='coerce') 0 -84 1 -46 2 -131 3 -131 4 -130 5 -80 Name: Aging, dtype: int64 </code></pre>

Extracting number of days from timedelta column in pandas

Aging
-84 days +11:36:15.000000000
-46 days +12:25:48.000000000
-131 days +20:53:45.000000000
-131 days +22:22:50.000000000
-130 days +01:02:03.000000000
-80 days +17:02:55.000000000

I am trying to extract the text before days in the above column. I tried the below:

df['new'] = df.Aging.split('days')[0]

The above returns

AttributeError: 'Series' object has no attribute 'split'

Expected output:

-84
-46
-131
-131
-130
-80

845

asked Dec 21 '18 05:12

hello kee

1 Answers

IMO, a better idea would be to convert to timedelta and extract the days component.

pd.to_timedelta(df.Aging, errors='coerce').dt.days

0    -84
1    -46
2   -131
3   -131
4   -130
5    -80
Name: Aging, dtype: int64

If you insist on using string methods, you can use str.extract.

pd.to_numeric(
    df.Aging.str.extract('(.*?) days', expand=False),
    errors='coerce')

0    -84
1    -46
2   -131
3   -131
4   -130
5    -80
Name: Aging, dtype: int32

Or, using str.split

pd.to_numeric(df.Aging.str.split(' days').str[0], errors='coerce')

0    -84
1    -46
2   -131
3   -131
4   -130
5    -80
Name: Aging, dtype: int64

142

answered Oct 12 '22 23:10

cs95

Related questions
                            
                                PyTorch gradient differs from manually calculated gradient
                            
                                Why cannot python PIL show two images in one program
                            
                                Why do I receive an AttributeError even though import, spelling and file location is correct?
                            
                                Scrapy - Use feed exporter for a particular spider (and not others) in a project
                            
                                Python redirect (with delay)
                            
                                Is it possible to split the training DataLoader (and dataset) into training and validation datasets?
                            
                                how to update scan Cython code in Theano?
                            
                                ML Engine Runtime version and Python version not supported
                            
                                Django - Admin - on form change
                            
                                Python: How to create and use a custom logger in python use logging module?
                            
                                Set Pandas column values to an array
                            
                                Syntax confusion during calling of functions from python classes [duplicate]
                            
                                Getting "TypeError: can't pickle thread.lock objects" when an object is deepcopied with log configs
                            
                                Sharing a counter with multiprocessing.Pool
                            
                                Python Square Root for Class Instances
                            
                                Use of Breakpoint Method
                            
                                How can I rename a PySpark dataframe column by index? (handle duplicated column names)
                            
                                Boxplot with Pandas in Python
                            
                                How to display actual values instead of percentages on my pie chart using matplotlibs [duplicate]
                            
                                RuntimeError: Too early to create image [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Extracting number of days from timedelta column in pandas

Tags:

python

regex

pandas

timedelta

hello kee

People also ask

1 Answers

cs95

Recent Activity

Donate For Us