python detect tab char

Tags:

I was trying to split words and ints inside a specific file. File's strings are in these form (line that contains word has not '\t' char but int numbers(all positive) have): (some words are numbers containing '-' char, )

-1234
\t22
\t44
\t46
absv
\t1
\t2
\t4
...

So my idea was to split words and strings by casting the line's object to float.

def is_number(s):
    try:
        float(s)
        return True
    except ValueError:
        return False

with codecs.open("/media/New Volume/3rd_step.txt", 'Ur') as file:#open file
    for line in file: # read line by line
        temp_buffer = line.split() # split elements
        for word in temp_buffer:
            if not('-' in word or not is_number(word)):
            ....

So if it was a word I would get exceptions if not then it is a number. The file is 50 Gb , and somewhere in the middle it seems that something goes wrong with the file's format. So the only possible way to split words and numbers is with the \t char. But how can I detect it? I mean I split the line to get the string and I lose the special chars that way.

EDIT:

I am really silly and newbe sorry for wasting your time. It seems that I can find it easier with this way:

with codecs.open("/media/D60A6CE00A6CBEDD/InvertedIndex/1.txt", 'Ur') as file:#open file
    for line in file: # read line by line
    if not '\t' in line:
            print line

971

asked Jul 09 '14 21:07

bill

1 Answers

You should try specifying your argument to split() instead of just using the default, which is all whitespace characters. You could have it initially split on all whitespace except \t. Try this:

white_str = list(string.whitespace)    # string.whitespace contains all whitespace.
white_str.remove("\t")                 # Remove \t
white_str = ''.join(white_str)         # New whitespace string, without \t

Then instead of split(), use split(white_str). This will split your lines on all whitespace except for \t to get your strings. Then you can detect \t later on for what you need.

189

answered Oct 28 '22 16:10

TheSoundDefense

Related questions
                            
                                Wrapping a LAPACKE function using Cython
                            
                                How to get a list of most popular pages from Google Analytics in Python (Django)?
                            
                                With PyQt, what is the preferred (efficient) method for monitoring window size and adjusting layouts?
                            
                                Understanding axis in Python
                            
                                Understanding LDA Transformed Corpus in Gensim
                            
                                How do i work with pre-compiled libraries in cython?
                            
                                Porting an old fortran program to work with python+numpy [closed]
                            
                                SSH Key-Forwarding using python paramiko
                            
                                python yaml.dump format list in other YAML format
                            
                                How to move or resize X11 windows (even if they are maximized)?
                            
                                Can you list all folders in an S3 bucket?
                            
                                Sympy absolute value of complex exponential
                            
                                How to filter a sqlalchemy query by a column in latest child item
                            
                                Scrapy shell gets 301 redirected to URL without parameters
                            
                                Custom domain routing to Flask server with custom domain always showing in address bar
                            
                                Sort Dictionary Keys in natural order [duplicate]
                            
                                Efficiency : String Slice Vs Custom Function
                            
                                Numpy Vectorized Function Over Successive 2d Slices
                            
                                ImportError: libpng16.so.16: cannot open shared object file: No such file or directory
                            
                                How to work with data indexed by floats in pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

python detect tab char

Tags:

python

string

split

bill

People also ask

1 Answers

TheSoundDefense

Recent Activity

Donate For Us