Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to record bad lines skipped by pandas

I'm reading a CSV file with pandas with

error_bad_lines=False

A warning is printed when a bad line is encountered. However, I want to keep a record of all the bad line numbers to feed into another program. Is there an easy way of doing that?

I thought about iterating over the file with a

chunksize=1

and catching the CParserError that ought to be thrown for each bad line encountered. When I do this though no CParserError is thrown for bad lines so I can't catch them.

like image 324
user3235250 Avatar asked Nov 01 '25 17:11

user3235250


1 Answers

Warnings are printed in the standard error channel. You can capture them to a file by redirecting the sys.stderr output.

import sys
import pandas as pd

with open('bad_lines.txt', 'w') as fp:
    sys.stderr = fp
    pd.read_csv('my_data.csv', error_bad_lines=False)
like image 171
James Avatar answered Nov 03 '25 07:11

James



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!