date calculation (TypeError: unsupported operand type(s) for -: 'str' and 'str')

Question

I have a data set as below:

        date_time       srch_co     srch_ci
0   2014-11-03 16:02:28 2014-12-19  2014-12-15
1   2013-03-13 19:25:01 2013-03-14  2013-03-13
2   2014-10-13 13:20:25 2015-04-10  2015-04-03
3   2013-11-05 10:40:34 2013-11-08  2013-11-07
4   2014-06-10 13:34:56 2014-08-08  2014-08-03
5   2014-12-16 14:34:39 2014-12-17  2014-12-16

And this is the information of the dataset:

     <class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 3 columns):
date_time    100000 non-null datetime64[ns]
srch_co      99878 non-null object
srch_ci      99878 non-null object
dtypes: datetime64[ns](1), object(2)
memory usage: 2.3+ MB

What I would like to do is create 2 new columns using the following function:

def duration(row):
    delta = (row['srch_co'] - row['srch_ci'])/np.timedelta64(1, 'D')
    if delta <= 0:
        return np.nan
    else:
        return delta

sample['duration'] = sample.apply(duration, axis=1)


def days_in_advance(row):
    delta = (row['srch_ci'] - row['date_time'])/np.timedelta64(1, 'D')
    if delta < 0:
        return np.nan
    else:
        return delta

sample['days_in_advance'] = sample.apply(days_in_advance, axis=1)

However, it seems like the date calculation I want to run constantly hits errors. I've searched and found several solutions and tried, but either they would create error or turn the date into inaccurate value.

The methods I've tried to use are such as:

#1)
def to_integer(dt_time):
    return 10000*dt_time.year + 100*dt_time.month + dt_time.day

#2)
datetime.strptime(str(row[2]), '%Y%m%d%H%M%S')

#3)    
pd.to_numeric(sample['date_time'], errors='coerce')

#4)
sample['srch_ci_int'] = sample['srch_ci'].astype(str).astype(int)

I just want to create new column that would calculate the difference of each columns:

sample["duration"] = sample["srch_co"] - sample["srch_ci"]
sample["days_in_advance"] = sample["srch_co"] - sample["date_time"]

Any tips appreciated.

jezrael · Accepted Answer

You need convert columns srch_co and srch_ci to_datetime first and then use mask for replace values less as 0 to NaN (default value of mask function):

sample["srch_co"] = pd.to_datetime(sample["srch_co"])
sample["srch_ci"] = pd.to_datetime(sample["srch_ci"])

sample["duration"] = (sample["srch_co"] - sample["srch_ci"])/np.timedelta64(1, 'D')
sample["days_in_advance"] = (sample["srch_co"] - sample["date_time"])/np.timedelta64(1, 'D')

cols = ['duration','days_in_advance']
sample[cols] = sample[cols].mask(sample[cols] < 0)
#first value of srch_ci column was changed for NaN output
print (sample)
            date_time    srch_co    srch_ci  duration  days_in_advance
0 2014-11-03 16:02:28 2014-12-19 2015-12-15       NaN        45.331620
1 2013-03-13 19:25:01 2013-03-14 2013-03-13       1.0         0.190961
2 2014-10-13 13:20:25 2015-04-10 2015-04-03       7.0       178.444155
3 2013-11-05 10:40:34 2013-11-08 2013-11-07       1.0         2.555162
4 2014-06-10 13:34:56 2014-08-08 2014-08-03       5.0        58.434074
5 2014-12-16 14:34:39 2014-12-17 2014-12-16       1.0         0.392604

date calculation (TypeError: unsupported operand type(s) for -: 'str' and 'str')

Tags:

python

tmhs

1 Answers

jezrael

Recent Activity

Donate For Us

date calculation (TypeError: unsupported operand type(s) for -: 'str' and 'str')

Tags:

python

tmhs

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us