A csv file looks like this:
a,b,c
1,2,3,
4,5,6,
a,b,c,
When I tried reading this file with pandas read_csv, the data frame looks like this :
|---------------|
| | a | b | c |
|---------------|
| 1 | 2 | 3 | |
| 4 | 5 | 6 | |
| a | b | c | |
|---------------|
I think the problem here in the data is : it looks like 1,2,3,space\n and pandas think there are 4 columns and the first column is unnamed. Is there any way I can change this to :
|-----------|
| a | b | c |
|-----------|
| 1 | 2 | 3 |
| 4 | 5 | 6 |
| a | b | c |
|-----------|
These files are around 50 million rows and there are many files. Is there any way to do it with minimal run-time ?
Use usecol
parameter in pd.read_csv
to read only the first three columns in the csv file.
from io import StringIO
csvtext = StringIO("""a,b,c
1,2,3,
4,5,6,
a,b,c, """)
df = pd.read_csv(csvtext, usecols=[0,1,2])
df
Output:
a b c
0 1 2 3
1 4 5 6
2 a b c
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With