Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove last the two digits in a column that is of integer type?

How can I remove the last two digits of a DataFrame column of type int64?

For example df['DATE'] includes:

DATE
20110708
20110709
20110710
20110711
20110712
20110713
20110714
20110815
20110816
20110817

What I would like is:

DATE
201107
201107
201107
201107
201107
201107
201107
201108
201108
201108

What is the simplest way of achieving this?

like image 621
Techno04335 Avatar asked Oct 09 '15 09:10

Techno04335


People also ask

How do you remove the last digit of an integer in Python?

To remove the last digit from an integer, use floor division to divide the integer by 10 , e.g. result_1 = my_int // 10 . Floor dividing by 10 will strip the last digit from the integer.

How do you remove the last two columns in Python?

You can also use DataFrame. drop() method to delete the last n columns. Use axis=1 to specify the columns and inplace=True to apply the change on the existing DataFrame.


2 Answers

Convert the dtype to str using astype then used vectorised str method to slice the str and then convert back to int64 dtype again:

In [184]:
df['DATE'] = df['DATE'].astype(str).str[:-2].astype(np.int64)
df

Out[184]:
     DATE
0  201107
1  201107
2  201107
3  201107
4  201107
5  201107
6  201107
7  201108
8  201108
9  201108

In [185]:    
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 1 columns):
DATE    10 non-null int64
dtypes: int64(1)
memory usage: 160.0 bytes

Hmm...

Turns out there is a built in method floordiv:

In [191]:
df['DATE'].floordiv(100)

Out[191]:
0    201107
1    201107
2    201107
3    201107
4    201107
5    201107
6    201107
7    201108
8    201108
9    201108
Name: DATE, dtype: int64

update

For a 1000 row df, the floordiv method is considerably faster:

%timeit df['DATE'].astype(str).str[:-2].astype(np.int64)
%timeit df['DATE'].floordiv(100)

100 loops, best of 3: 2.92 ms per loop
1000 loops, best of 3: 203 µs per loop

Here we observe ~10x speedup

like image 188
EdChum Avatar answered Sep 24 '22 22:09

EdChum


You could use floor division // to drop the last two digits and preserve the integer type:

>>> df['DATE'] // 100
     DATE
0  201107
1  201107
2  201107
3  201107
4  201107
5  201107
6  201107
7  201108
8  201108
9  201108
like image 36
Alex Riley Avatar answered Sep 22 '22 22:09

Alex Riley