I have a Python dataframe with about 1,500 rows and 15 columns. With one specific column I would like to remove the first 3 characters of each row. As a simple example here is a dataframe:
import pandas as pd
d = {
'Report Number':['8761234567', '8679876543','8994434555'],
'Name' :['George', 'Bill', 'Sally']
}
d = pd.DataFrame(d)
I would like to remove the first three characters from each field in the Report Number
column of dataframe d
.
Using lstrip() function Hello!' ch = '!' print(s) # Hello! That's all about removing the first character from a string in Python.
You can use df. head() to get the first N rows in Pandas DataFrame. Alternatively, you can specify a negative number within the brackets to get all the rows, excluding the last N rows.
Use vectorised str
methods to slice each string entry
In [11]:
d['Report Number'] = d['Report Number'].str[3:]
d
Out[11]:
Name Report Number
0 George 1234567
1 Bill 9876543
2 Sally 4434555
It is worth noting Pandas "vectorised" str
methods are no more than Python-level loops.
Assuming clean data, you will often find a list comprehension more efficient:
# Python 3.6.0, Pandas 0.19.2
d = pd.concat([d]*10000, ignore_index=True)
%timeit d['Report Number'].str[3:] # 12.1 ms per loop
%timeit [i[3:] for i in d['Report Number']] # 5.78 ms per loop
Note these aren't equivalent, since the list comprehension does not deal with null data and other edge cases. For these situations, you may prefer the Pandas solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With