Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read CSV file in Pandas with Blank lines in between

Tags:

python

pandas

csv

I have a data.csv file like this

Col1,Col2,Col3,Col4,Col5  
10,12,14,15,16  
18,20,22,24,26  
28,30,32,34,36  
38,40,42,44,46  
48,50,52,54,56

Col6,Col7  
11,12  
13,14  
...

Now, I want to read only the data of columns Col1 to Col5 and I don't require Col6 and Col7.

I tried reading this file using

df = pd.read_csv('data.csv',header=0)

then its throwing an error saying

UnicodeDecodeError : 'utf-8' codec cant decode byte 0xb2 in position 3: invalid start byte

Then, I tried this

df = pd.read_csv('data.csv',header=0,error_bad_lines=True)

But this is also not giving the desired result. How can we read only till the first blank line in the csv file?

like image 780
Bhaskar Avatar asked Oct 13 '25 01:10

Bhaskar


1 Answers

You can create a generator which reads a file line by line. The result is passed to pandas:

import pandas as pd
import io


def file_reader(filename):
    with open(filename) as f:
        for line in f:
            if line and line != '\n':
                yield line
            else:
                break


data = io.StringIO(''.join(file_reader('data.csv')))
df = pd.read_csv(data)
like image 196
Eir Nym Avatar answered Oct 14 '25 14:10

Eir Nym