Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas read csv, trim last two characters

A csv file looks like this:

a,b,c
1,2,3, 
4,5,6, 
a,b,c, 

When I tried reading this file with pandas read_csv, the data frame looks like this :

   |---------------|
   |   | a | b | c |
   |---------------|
   | 1 | 2 | 3 |   |
   | 4 | 5 | 6 |   |
   | a | b | c |   |
   |---------------|

I think the problem here in the data is : it looks like 1,2,3,space\n and pandas think there are 4 columns and the first column is unnamed. Is there any way I can change this to :

   |-----------|
   | a | b | c |
   |-----------|
   | 1 | 2 | 3 |
   | 4 | 5 | 6 |
   | a | b | c |
   |-----------|

These files are around 50 million rows and there are many files. Is there any way to do it with minimal run-time ?

like image 701
Venkata Gogu Avatar asked Dec 18 '22 23:12

Venkata Gogu


1 Answers

Use usecol parameter in pd.read_csv to read only the first three columns in the csv file.

from io import StringIO
csvtext = StringIO("""a,b,c
1,2,3, 
4,5,6, 
a,b,c, """)

df = pd.read_csv(csvtext, usecols=[0,1,2])
df

Output:

   a  b  c
0  1  2  3
1  4  5  6
2  a  b  c
like image 159
Scott Boston Avatar answered Jan 08 '23 19:01

Scott Boston