Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove first x number of characters from each row in a column of a Python dataframe

I have a Python dataframe with about 1,500 rows and 15 columns. With one specific column I would like to remove the first 3 characters of each row. As a simple example here is a dataframe:

import pandas as pd

d = {
    'Report Number':['8761234567', '8679876543','8994434555'],
    'Name'         :['George', 'Bill', 'Sally']
     }

d = pd.DataFrame(d)

I would like to remove the first three characters from each field in the Report Number column of dataframe d.

like image 995
d84_n1nj4 Avatar asked Feb 20 '17 16:02

d84_n1nj4


People also ask

How do I remove the first letter from a column in python?

Using lstrip() function Hello!' ch = '!' print(s) # Hello! That's all about removing the first character from a string in Python.

How do you extract the first 10 rows in pandas?

You can use df. head() to get the first N rows in Pandas DataFrame. Alternatively, you can specify a negative number within the brackets to get all the rows, excluding the last N rows.


2 Answers

Use vectorised str methods to slice each string entry

In [11]:
d['Report Number'] = d['Report Number'].str[3:]
d

Out[11]:
     Name Report Number
0  George       1234567
1    Bill       9876543
2   Sally       4434555
like image 104
EdChum Avatar answered Oct 07 '22 12:10

EdChum


It is worth noting Pandas "vectorised" str methods are no more than Python-level loops.

Assuming clean data, you will often find a list comprehension more efficient:

# Python 3.6.0, Pandas 0.19.2

d = pd.concat([d]*10000, ignore_index=True)

%timeit d['Report Number'].str[3:]           # 12.1 ms per loop
%timeit [i[3:] for i in d['Report Number']]  # 5.78 ms per loop

Note these aren't equivalent, since the list comprehension does not deal with null data and other edge cases. For these situations, you may prefer the Pandas solution.

like image 38
jpp Avatar answered Oct 07 '22 14:10

jpp