Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to ignore NULL byte when reading a csv file

Tags:

python

csv

I'm reading a csv file generated from an equipment and I got this error message:

Error: line contains NULL byte

I opened the csv file in text editor and I do see there're some NUL bytes in the header section, which I don't really care about. How can I make the csv reader function to ignore the NUL byte and just goes through the rest of the file?

There're two blank lines between the header section and the data, maybe there's way to skip the whole header?

My code for reading the csv file is

with open(FileName, 'r', encoding='utf-8') as csvfile:
  csvreader = csv.reader(csvfile)
like image 843
LWZ Avatar asked Aug 25 '16 17:08

LWZ


People also ask

What is a null character in CSV?

A null marker is a character string that defines the presence of a NULL value. Typical null markers are the 4-character string NULL and the 2-character string \N . Only CHAR and VARCHAR columns can store an empty value (a zero-length string).

How do I check if a csv file is null?

isnull() To download the CSV file used, Click Here. In the following example, Team column is checked for NULL values and a boolean series is returned by the isnull() method which stores True for ever NaN value and False for a Not null value.

What is null byte in file?

In C/C++, a null byte represents the string termination point or delimiter character which means to stop processing the string immediately. Bytes following the delimiter will be ignored. If the string loses its null character, the length of a string becomes unknown until memory pointer happens to meet next zero byte.


2 Answers

This will replace your NULL bytes

csvreader = csv.reader(x.replace('\0', '') for x in csvfile)
like image 194
Marlon Abeykoon Avatar answered Sep 22 '22 07:09

Marlon Abeykoon


When reading in a csv, you have to remove those.

There was an article I read a while ago, it was called the taco bell method of programming. In it, the article posits that taco bell really only has 8 ingredients, but from it, makes all their chalupas, and bean thingy's, and other inedible food products.

Probably should add doritos to that ingredient list. Still, the point remains.

wget, awk, sed, etc. Those should be used when you can. No point in making it overly complicated and bringing in all these libs to do it in one language.

So, I ask, can you do it in UNIX first? And you can.

UNIX

tr < file-in -d '\000' > file-out

it's quick and will work.

...Now, get back to tacos.

like image 23
CENTURION Avatar answered Sep 23 '22 07:09

CENTURION