Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python detect tab char

I was trying to split words and ints inside a specific file. File's strings are in these form (line that contains word has not '\t' char but int numbers(all positive) have): (some words are numbers containing '-' char, )

-1234
\t22
\t44
\t46
absv
\t1
\t2
\t4
... 

So my idea was to split words and strings by casting the line's object to float.

def is_number(s):
    try:
        float(s)
        return True
    except ValueError:
        return False

with codecs.open("/media/New Volume/3rd_step.txt", 'Ur') as file:#open file
    for line in file: # read line by line
        temp_buffer = line.split() # split elements
        for word in temp_buffer:
            if not('-' in word or not is_number(word)):
            ....

So if it was a word I would get exceptions if not then it is a number. The file is 50 Gb , and somewhere in the middle it seems that something goes wrong with the file's format. So the only possible way to split words and numbers is with the \t char. But how can I detect it? I mean I split the line to get the string and I lose the special chars that way.

EDIT:

I am really silly and newbe sorry for wasting your time. It seems that I can find it easier with this way:

with codecs.open("/media/D60A6CE00A6CBEDD/InvertedIndex/1.txt", 'Ur') as file:#open file
    for line in file: # read line by line
    if not '\t' in line:
            print line
like image 971
bill Avatar asked Jul 09 '14 21:07

bill


People also ask

How do I find the tab in a string in Python?

How do you print a tab character in Python? The easiest way to print a tab character in Python is to use the short-hand abbreviation '\t' . To see the tab spaced character in the REPL wrap any variable containing a tab character in the built-in print() function.

What is the tab character in Python?

In Python strings, the backslash "\" is a special character, also called the "escape" character. It is used in representing certain whitespace characters: "\t" is a tab, "\n" is a newline, and "\r" is a carriage return.

How do you print values separated by tab space in Python?

You can directly use the escape sequence “ \t ” tab character to print a list tab-separated in Python.

How do you split a string by a tab in Python?

Use the str. split() method to split a string by tabs, e.g. my_list = my_str. split('\t') .


1 Answers

You should try specifying your argument to split() instead of just using the default, which is all whitespace characters. You could have it initially split on all whitespace except \t. Try this:

white_str = list(string.whitespace)    # string.whitespace contains all whitespace.
white_str.remove("\t")                 # Remove \t
white_str = ''.join(white_str)         # New whitespace string, without \t

Then instead of split(), use split(white_str). This will split your lines on all whitespace except for \t to get your strings. Then you can detect \t later on for what you need.

like image 189
TheSoundDefense Avatar answered Oct 28 '22 16:10

TheSoundDefense