Pandas dataframe read_csv on bad data

Tags:

I want to read in a very large csv (cannot be opened in excel and edited easily) but somewhere around the 100,000th row, there is a row with one extra column causing the program to crash. This row is errored so I need a way to ignore the fact that it was an extra column. There is around 50 columns so hardcoding the headers and using names or usecols isn't preferable. I'll also possibly encounter this issue in other csv's and want a generic solution. I couldn't find anything in read_csv unfortunately. The code is as simple as this:

def loadCSV(filePath):     dataframe = pd.read_csv(filePath, index_col=False, encoding='iso-8859-1', nrows=1000)     datakeys = dataframe.keys();     return dataframe, datakeys

523

asked Oct 30 '15 16:10

Fonti

1 Answers

pass error_bad_lines=False to skip erroneous rows:

error_bad_lines : boolean, default True Lines with too many fields (e.g. a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. If False, then these “bad lines” will dropped from the DataFrame that is returned. (Only valid with C parser)

175

answered Sep 19 '22 08:09

EdChum

Related questions
                            
                                Open PIL image from byte file
                            
                                Launch Pycharm from command line (terminal)
                            
                                How can I check that a list has one and only one truthy value?
                            
                                Why can't dataclasses have mutable defaults in their class attributes declaration?
                            
                                Generate all permutations of a list without adjacent equal elements
                            
                                Boolean Series key will be reindexed to match DataFrame index
                            
                                Python threading. How do I lock a thread?
                            
                                Should Python class filenames also be camelCased?
                            
                                How do I set up Vim autoindentation properly for editing Python files?
                            
                                Convert Variable Name to String?
                            
                                Python variables as keys to dict
                            
                                How do you add additional files to a wheel?
                            
                                Access self from decorator
                            
                                Logging variable data with new format string
                            
                                How do threads work in Python, and what are common Python-threading specific pitfalls?
                            
                                Catch Ctrl+C / SIGINT and exit multiprocesses gracefully in python [duplicate]
                            
                                get dataframe row count based on conditions
                            
                                Accuracy Score ValueError: Can't Handle mix of binary and continuous target
                            
                                How slow is Python's string concatenation vs. str.join?
                            
                                Cannot pass an argument to python with "#!/usr/bin/env python"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas dataframe read_csv on bad data

Tags:

python

pandas

csv

Fonti

People also ask

1 Answers

EdChum

Recent Activity

Donate For Us