Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading a text file using Pandas where some rows have empty elements?

Tags:

python

pandas

I have a dataset in a textfile that looks like this.

    0    0CF00400 X       8  66  7D  91  6E  22  03  0F  7D       0.021650 R
    0    18EA0080 X       3  E9  FE  00                           0.022550 R
    0    00000003 X       8  D5  64  22  E1  FF  FF  FF  F0       0.023120 R

I read this using

file_pandas = pd.read_csv(fileName, delim_whitespace = True, header = None, engine = 'python')

And got the output

    0   0  0CF00400  X   8  66  7D  91        6E  22    03    0F    7D  0.02165   
    1   0  18EA0080  X   3  E9  FE   0  0.022550   R  None  None  None      NaN   
    2   0  00000003  X   8  D5  64  22        E1  FF    FF    FF    F0  0.02312   

But I want this read as

    0   0  0CF00400  X   8  66  7D  91        6E  22    03    0F    7D  0.021650   R  
    1   0  18EA0080  X   3  E9  FE  00                                  0.022550   R
    2   0  00000003  X   8  D5  64  22        E1  FF    FF    FF    F0  0.023120   R

I've tried removing delim_whitespace = True and replacing it with delimiter = " " but that just combined the first four columns in the output shown above, but it did parse the rest of the data correctly, meaning that the rest of the columns were like the origin txt file (barring the NaN values in whitespaces).

I'm not sure how to proceed from here.

Side note: the 00 is being parsed as only 0. Is there a way to display 00 instead?

like image 979
Aditya Salapaka Avatar asked Oct 19 '16 15:10

Aditya Salapaka


1 Answers

It seems like your data is fixed width columns, you can try pandas.read_fwf():

from io import StringIO
import pandas as pd

df = pd.read_fwf(StringIO("""0    0CF00400 X       8  66  7D  91  6E  22  03  0F  7D       0.021650 R
0    18EA0080 X       3  E9  FE  00                           0.022550 R
0    00000003 X       8  D5  64  22  E1  FF  FF  FF  F0       0.023120 R"""), 
                 header = None, widths = [1,12,2,8,4,4,4,4,4,4,4,4,16,2])

enter image description here

like image 70
Psidom Avatar answered Oct 14 '22 16:10

Psidom