I have a pandas data frame with a column on dates in this format "2016-05-03" These are strings btw. I need to convert them to an int from string and split at the hyphen('-') and only extract for the year so [0].
This is what I have tried to turn the string into an integer:
tyc.startDate = tyc.startDate.astype(np.int64)
But it is returning and error:
ValueError: invalid literal for int() with base 10: '2015-06-01'
and this is what I've done for splitting:
tyc.startDate.str.split('-')[0]
and
tyc.startDate.str.split('-', [0])
but this isn't working either, it's splitting and returning a list of all the rows in the column in this form: ['2015', '06', '01'] and I want to just split for the year!
I'm sure there is a simple way to just convert to int and split for ('-') at position 0 and then put that into the df as a new column, please help!
I believe your data contains NaNs or some not datetime values:
tyc = pd.DataFrame({'startDate':['2016-05-03','2017-05-03', np.nan],
'col':[1,2,3]})
print (tyc)
col startDate
0 1 2016-05-03
1 2 2017-05-03
2 3 NaN
Use str[0] for return first list value of each row first. But then there is problem - some NaNs, which cannot be converted to int (be design) - so output is floats:
print (tyc.startDate.str.split('-').str[0].astype(float))
0 2016.0
1 2017.0
2 NaN
Name: startDate, dtype: float64
Another solution is convert to datetime by to_datetime and parse year by year:
print (pd.to_datetime(tyc.startDate, errors='coerce'))
0 2016-05-03
1 2017-05-03
2 NaT
Name: startDate, dtype: datetime64[ns]
print (pd.to_datetime(tyc.startDate, errors='coerce').dt.year)
0 2016.0
1 2017.0
2 NaN
Name: startDate, dtype: float64
Solutions for remove NaNs:
tyc['year'] = pd.to_datetime(tyc.startDate, errors='coerce').dt.year
print (tyc)
col startDate year
0 1 2016-05-03 2016.0
1 2 2017-05-03 2017.0
2 3 NaN NaN
1.
Remove all rows with NaNs by dropna and then cast to int:
tyc = tyc.dropna(subset=['year'])
tyc['year'] = tyc['year'].astype(int)
print (tyc)
col startDate year
0 1 2016-05-03 2016
1 2 2017-05-03 2017
2.
Replace NaNs by some int value like 1 by fillna and then cast to int:
tyc['year'] = tyc['year'].fillna(1).astype(int)
print (tyc)
col startDate year
0 1 2016-05-03 2016
1 2 2017-05-03 2017
2 3 NaN 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With