Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

determine "type of value" from a string in python

I'm trying to write a function in python, which will determine what type of value is in string; for example

if in string is 1 or 0 or True or False the value is BIT

if in string is 0-9*, the value is INT

if in string is 0-9+.0-9+ the value is float

if in string is stg more (text, etc) value is text

so far i have stg like

def dataType(string):

 odp=''
 patternBIT=re.compile('[01]')
 patternINT=re.compile('[0-9]+')
 patternFLOAT=re.compile('[0-9]+\.[0-9]+')
 patternTEXT=re.compile('[a-zA-Z0-9]+')
 if patternTEXT.match(string):
     odp= "text"
 if patternFLOAT.match(string):
     odp= "FLOAT"
 if patternINT.match(string):
     odp= "INT"
 if patternBIT.match(string):
     odp= "BIT"

 return odp 

But i'm not very skilled in using regexes in python..could you please tell, what am i doing wrong? For example it doesn't work for 2010-00-10 which should be Text, but is INT or 20.90, which should be float but is int

like image 973
Johnzzz Avatar asked Apr 21 '12 17:04

Johnzzz


People also ask

How do I find the data type of a string in Python?

Method #1 : Using isinstance(x, str) This method can be used to test whether any variable is a particular datatype. By giving the second argument as “str”, we can check if the variable we pass is a string or not.

How do you check what type a value is Python?

To get the type of a variable in Python, you can use the built-in type() function. In Python, everything is an object. So, when you use the type() function to print the type of the value stored in a variable to the console, it returns the class type of the object.

What does %d and %S mean in Python?

%s is used as a placeholder for string values you want to inject into a formatted string. %d is used as a placeholder for numeric or decimal values.

How do you check if a value is in a string?

Use the typeof operator to check if a variable is a string, e.g. if (typeof variable === 'string') . If the typeof operator returns "string" , then the variable is a string. In all other cases the variable isn't a string. Copied!


2 Answers

Before you go too far down the regex route, have you considered using ast.literal_eval

Examples:

In [35]: ast.literal_eval('1')
Out[35]: 1

In [36]: type(ast.literal_eval('1'))
Out[36]: int

In [38]: type(ast.literal_eval('1.0'))
Out[38]: float

In [40]: type(ast.literal_eval('[1,2,3]'))
Out[40]: list

May as well use Python to parse it for you!

OK, here is a bigger example:

import ast, re
def dataType(str):
    str=str.strip()
    if len(str) == 0: return 'BLANK'
    try:
        t=ast.literal_eval(str)

    except ValueError:
        return 'TEXT'
    except SyntaxError:
        return 'TEXT'

    else:
        if type(t) in [int, long, float, bool]:
            if t in set((True,False)):
                return 'BIT'
            if type(t) is int or type(t) is long:
                return 'INT'
            if type(t) is float:
                return 'FLOAT'
        else:
            return 'TEXT' 



testSet=['   1  ', ' 0 ', 'True', 'False',   #should all be BIT
         '12', '34l', '-3','03',              #should all be INT
         '1.2', '-20.4', '1e66', '35.','-   .2','-.2e6',      #should all be FLOAT
         '10-1', 'def', '10,2', '[1,2]','35.9.6','35..','.']

for t in testSet:
    print "{:10}:{}".format(t,dataType(t))

Output:

   1      :BIT
 0        :BIT
True      :BIT
False     :BIT
12        :INT
34l       :INT
-3        :INT
03        :INT
1.2       :FLOAT
-20.4     :FLOAT
1e66      :FLOAT
35.       :FLOAT
-   .2    :FLOAT
-.2e6     :FLOAT
10-1      :TEXT
def       :TEXT
10,2      :TEXT
[1,2]     :TEXT
35.9.6    :TEXT
35..      :TEXT
.         :TEXT

And if you positively MUST have a regex solution, which produces the same results, here it is:

def regDataType(str):
    str=str.strip()
    if len(str) == 0: return 'BLANK'

    if re.match(r'True$|^False$|^0$|^1$', str):
        return 'BIT'
    if re.match(r'([-+]\s*)?\d+[lL]?$', str): 
        return 'INT'
    if re.match(r'([-+]\s*)?[1-9][0-9]*\.?[0-9]*([Ee][+-]?[0-9]+)?$', str): 
        return 'FLOAT'
    if re.match(r'([-+]\s*)?[0-9]*\.?[0-9][0-9]*([Ee][+-]?[0-9]+)?$', str): 
        return 'FLOAT'

    return 'TEXT' 

I cannot recommend the regex over the ast version however; just let Python do the interpretation of what it thinks these data types are rather than interpret them with a regex...

like image 167
the wolf Avatar answered Oct 26 '22 23:10

the wolf


You could also use json.

import json
converted_val = json.loads('32.45')
type(converted_val)

Outputs

type <'float'>

EDIT

To answer your question, however:

re.match() returns partial matches, starting from the beginning of the string. Since you keep evaluating every pattern match the sequence for "2010-00-10" goes like this:

if patternTEXT.match(str_obj): #don't use 'string' as a variable name.

it matches, so odp is set to "text"

then, your script does:

if patternFLOAT.match(str_obj):

no match, odp still equals "text"

if patternINT.match(str_obj):

partial match odp is set to "INT"

Because match returns partial matches, multiple if statements are evaluated and the last one evaluated determines which string is returned in odp.

You can do one of several things:

  1. rearrange the order of your if statements so that the last one to match is the correct one.

  2. use if and elif for the rest of your if statements so that only the first statement to match is evaluated.

  3. check to make sure the match object is matching the entire string:

    ...
    match = patternINT.match(str_obj)
    if match:
        if match.end() == match.endpos:
            #do stuff
    ...
    
like image 22
Joel Cornett Avatar answered Oct 27 '22 00:10

Joel Cornett