Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does read_csv skiprows value need to be lower than it should be in this case?

I have a log file (Text.TXT in this case):

# 1: 5
# 3: x
# F: 5.
# ID: 001
# No.: 2
# No.: 4
# Time: 20191216T122109
# Value: ";"
# Time: 4
# Time: ""
# Time ms: ""
# Date: ""
# Time separator: "T"
# J: 1000000
# Silent: false
# mode: true
Timestamp;T;ID;P
16T122109957;0;6;0006

To read in this log file into pandas and ignore all the header info I would use skiprows up to line 16 like so:

pd.read_csv('test.TXT',skiprows=16,sep=';')

But this produces EmptyDataError as it is skipping past where the data is starting. To make this work I've had to use it on line 11:

pd.read_csv('test.TXT',skiprows=11,sep=';')
      Timestamp  T  ID  P
0  16T122109957  0   6  6

My question is if the data doesn't start until row 17, in this case, why do I need to request a skiprows up to row 11?

like image 466
RMRiver Avatar asked Oct 15 '22 03:10

RMRiver


People also ask

What does Skiprows mean in python?

skiprowslist-like, int or callable, optional. Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file. If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise.

What is the default value of SEP in read_csv () function?

The default value of the sep parameter is the comma (,) which means if we don't specify the sep parameter in our read_csv() function, it is understood that our file is using comma as the delimiter.


1 Answers

One work around is to use comment parameter of pd.read_csv

from io import StringIO

text='''# 1: 5
# 3: x
# F: 5.
# ID: 001
# No.: 2
# No.: 4
# Time: 20191216T122109
# Value: ";"
# Time: 4
# Time: ""
# Time ms: ""
# Date: ""
# Time separator: "T"
# J: 1000000
# Silent: false
# mode: true
Timestamp;T;ID;P
16T122109957;0;6;0006'''

df = pd.read_csv(StringIO(text),comment='#',sep=';')
df
      Timestamp  T  ID  P
0  16T122109957  0   6  6

Or

df = pd.read_csv(StringIO(text),header=0,comment='#',sep=';')

From docs under header parameter:

Note that this parameter ignores commented lines and empty lines if skip_blank_lines=True, so header=0 denotes the first line of data rather than the first line of the file.

Not sure about skiprows's weird behaviour here.

like image 57
Ch3steR Avatar answered Nov 15 '22 06:11

Ch3steR