How can I remove the last two digits of a DataFrame column of type int64?
For example df['DATE']
includes:
DATE
20110708
20110709
20110710
20110711
20110712
20110713
20110714
20110815
20110816
20110817
What I would like is:
DATE
201107
201107
201107
201107
201107
201107
201107
201108
201108
201108
What is the simplest way of achieving this?
To remove the last digit from an integer, use floor division to divide the integer by 10 , e.g. result_1 = my_int // 10 . Floor dividing by 10 will strip the last digit from the integer.
You can also use DataFrame. drop() method to delete the last n columns. Use axis=1 to specify the columns and inplace=True to apply the change on the existing DataFrame.
Convert the dtype to str using astype
then used vectorised str
method to slice the str and then convert back to int64
dtype again:
In [184]:
df['DATE'] = df['DATE'].astype(str).str[:-2].astype(np.int64)
df
Out[184]:
DATE
0 201107
1 201107
2 201107
3 201107
4 201107
5 201107
6 201107
7 201108
8 201108
9 201108
In [185]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 1 columns):
DATE 10 non-null int64
dtypes: int64(1)
memory usage: 160.0 bytes
Hmm...
Turns out there is a built in method floordiv
:
In [191]:
df['DATE'].floordiv(100)
Out[191]:
0 201107
1 201107
2 201107
3 201107
4 201107
5 201107
6 201107
7 201108
8 201108
9 201108
Name: DATE, dtype: int64
update
For a 1000 row df, the floordiv
method is considerably faster:
%timeit df['DATE'].astype(str).str[:-2].astype(np.int64)
%timeit df['DATE'].floordiv(100)
100 loops, best of 3: 2.92 ms per loop
1000 loops, best of 3: 203 µs per loop
Here we observe ~10x speedup
You could use floor division //
to drop the last two digits and preserve the integer type:
>>> df['DATE'] // 100
DATE
0 201107
1 201107
2 201107
3 201107
4 201107
5 201107
6 201107
7 201108
8 201108
9 201108
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With