Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert data on reading csv in pandas

Tags:

python

pandas

I'm reading a .csv file into a pandas dataframe. The .csv file contains several columns. Column 'A' contains a string '20-989-98766'. Is it possible to only read the last 5 characters '98766' from the string when loading the file?

df = pd.read_csv("test_data2.csv", column={'A':read the last 5 characters})

output:

A
98766
95476
.....
like image 381
magicsword Avatar asked Apr 11 '17 15:04

magicsword


People also ask

How do I read a CSV file into Python and convert to DataFrame?

Pandas read_csv() function imports a CSV file to DataFrame format. header: this allows you to specify which row will be used as column names for your dataframe. Expected an int value or a list of int values. Default value is header=0 , which means the first row of the CSV file will be treated as column names.

What does parse_dates in pandas do?

If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x.


1 Answers

You can define a func and pass this as an arg to converters param for read_csv:

In [57]:
import io
import pandas as pd
def func(x):
    return x[-5:]
t="""column
'20-989-98766"""
df = pd.read_csv(io.StringIO(t), converters={'column': func})
df

​
Out[57]:
  column
0  98766

So here I define a func and pass this to converters in the form of a dict with your column name as the key, this will call the func on every row in your csv

so in your case the following should work:

df = pd.read_csv("test_data2.csv", converters={'A':func})
like image 60
EdChum Avatar answered Oct 17 '22 14:10

EdChum