A csv file looks like this:
a,b,c
1,2,3, 
4,5,6, 
a,b,c, 
When I tried reading this file with pandas read_csv, the data frame looks like this :
   |---------------|
   |   | a | b | c |
   |---------------|
   | 1 | 2 | 3 |   |
   | 4 | 5 | 6 |   |
   | a | b | c |   |
   |---------------|
I think the problem here in the data is : it looks like 1,2,3,space\n and pandas think there are 4 columns and the first column is unnamed. Is there any way I can change this to :
   |-----------|
   | a | b | c |
   |-----------|
   | 1 | 2 | 3 |
   | 4 | 5 | 6 |
   | a | b | c |
   |-----------|
These files are around 50 million rows and there are many files. Is there any way to do it with minimal run-time ?
Use usecol parameter in pd.read_csv to read only the first three columns in the csv file.
from io import StringIO
csvtext = StringIO("""a,b,c
1,2,3, 
4,5,6, 
a,b,c, """)
df = pd.read_csv(csvtext, usecols=[0,1,2])
df
Output:
   a  b  c
0  1  2  3
1  4  5  6
2  a  b  c
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With