I am trying to read a space delimited file in Python using read_csv from panda. It works by specifying delimiter=" ". Problem arises when there are certain missing values in columns, because it ignores the missing value by considering it as a delimiter.
Is there a way to resolve this problem?
1600 1141.0000 020006 600 1141.0000 69.0000 OAUC 0.0000
1 1070.5000 020032 1 1070.5000 400.0000 0.0000
You can see there is a missing value in the column with value OAUC. There is uneven spacing between columns which is making it more difficult. Also the columns are fixed, so it's possible to find out that some value is missing but finding out which value is missing hasn't been possible yet.
I agree with Justin that cleaning it up first is the best way to be sure to get it right. If you can skim your results to verify quality control, than this hack might get the job done in this case.
pd.read_csv(header=None, sep='\s{1, 7}')
I'll say again, this is not a great idea. If you just want to get a smallish data set loaded, it will do the job. But if you can't verify that it worked, better use read_fwf and carefully specify colspecs, or follow Justin's advice and clean up the file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With