Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas.errors.ParserError: Too many columns specified: expected 9996 and found 9808

Tags:

python

pandas

When I use pandas to process my data, here is an error like title. My data's column is not equivalent, So I sort it in an descending order. The first line is the longest and next line is shorter and so on. When the file is small, pandas can process it successfully. But after I write all my data in the file, it can't process and show me this error.

here is my code:

def sequencein(filepath):    
    print (filepath)    
    print("time", time.time())    
    data = pd.read_table(filepath, header=None)    
    print("time", time.time())    
    matr = data.values    
    print("sequence shape:", matr.shape)    
    return matr

file's end of the line is shown below: enter image description here

like image 670
Chenghui Zhao Avatar asked Dec 24 '22 12:12

Chenghui Zhao


2 Answers

The documentation says there are two engines:

engine : {‘c’, ‘python’}, optional

Parser engine to use. The C engine is faster while the python engine is currently more feature-complete.

The problem seems to appear only with the 'c' engine, which is selected automatically for larger files.

So, you could try

data = pd.read_table(filepath, header=None, engine='python')  
like image 182
Benjamin Maier Avatar answered Dec 28 '22 07:12

Benjamin Maier


I have solved this problem by myself. I just modified data = pd.read_table(filepath, header=None) to data = pd.read_table(filepath). Then I added a header line in my data file and it worked.

like image 40
Chenghui Zhao Avatar answered Dec 28 '22 08:12

Chenghui Zhao