I'm reading a .csv file into a pandas dataframe. The .csv file contains several columns. Column 'A' contains a string '20-989-98766'. Is it possible to only read the last 5 characters '98766' from the string when loading the file?
df = pd.read_csv("test_data2.csv", column={'A':read the last 5 characters})
output:
A
98766
95476
.....
Pandas read_csv() function imports a CSV file to DataFrame format. header: this allows you to specify which row will be used as column names for your dataframe. Expected an int value or a list of int values. Default value is header=0 , which means the first row of the CSV file will be treated as column names.
If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x.
You can define a func
and pass this as an arg to converters
param for read_csv
:
In [57]:
import io
import pandas as pd
def func(x):
return x[-5:]
t="""column
'20-989-98766"""
df = pd.read_csv(io.StringIO(t), converters={'column': func})
df
Out[57]:
column
0 98766
So here I define a func
and pass this to converters
in the form of a dict with your column name as the key, this will call the func
on every row in your csv
so in your case the following should work:
df = pd.read_csv("test_data2.csv", converters={'A':func})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With