I'm trying to write a function in python, which will determine what type of value is in string; for example
if in string is 1 or 0 or True or False the value is BIT
if in string is 0-9*, the value is INT
if in string is 0-9+.0-9+ the value is float
if in string is stg more (text, etc) value is text
so far i have stg like
def dataType(string):
odp=''
patternBIT=re.compile('[01]')
patternINT=re.compile('[0-9]+')
patternFLOAT=re.compile('[0-9]+\.[0-9]+')
patternTEXT=re.compile('[a-zA-Z0-9]+')
if patternTEXT.match(string):
odp= "text"
if patternFLOAT.match(string):
odp= "FLOAT"
if patternINT.match(string):
odp= "INT"
if patternBIT.match(string):
odp= "BIT"
return odp
But i'm not very skilled in using regexes in python..could you please tell, what am i doing wrong? For example it doesn't work for 2010-00-10 which should be Text, but is INT or 20.90, which should be float but is int
Method #1 : Using isinstance(x, str) This method can be used to test whether any variable is a particular datatype. By giving the second argument as “str”, we can check if the variable we pass is a string or not.
To get the type of a variable in Python, you can use the built-in type() function. In Python, everything is an object. So, when you use the type() function to print the type of the value stored in a variable to the console, it returns the class type of the object.
%s is used as a placeholder for string values you want to inject into a formatted string. %d is used as a placeholder for numeric or decimal values.
Use the typeof operator to check if a variable is a string, e.g. if (typeof variable === 'string') . If the typeof operator returns "string" , then the variable is a string. In all other cases the variable isn't a string. Copied!
Before you go too far down the regex route, have you considered using ast.literal_eval
Examples:
In [35]: ast.literal_eval('1')
Out[35]: 1
In [36]: type(ast.literal_eval('1'))
Out[36]: int
In [38]: type(ast.literal_eval('1.0'))
Out[38]: float
In [40]: type(ast.literal_eval('[1,2,3]'))
Out[40]: list
May as well use Python to parse it for you!
OK, here is a bigger example:
import ast, re
def dataType(str):
str=str.strip()
if len(str) == 0: return 'BLANK'
try:
t=ast.literal_eval(str)
except ValueError:
return 'TEXT'
except SyntaxError:
return 'TEXT'
else:
if type(t) in [int, long, float, bool]:
if t in set((True,False)):
return 'BIT'
if type(t) is int or type(t) is long:
return 'INT'
if type(t) is float:
return 'FLOAT'
else:
return 'TEXT'
testSet=[' 1 ', ' 0 ', 'True', 'False', #should all be BIT
'12', '34l', '-3','03', #should all be INT
'1.2', '-20.4', '1e66', '35.','- .2','-.2e6', #should all be FLOAT
'10-1', 'def', '10,2', '[1,2]','35.9.6','35..','.']
for t in testSet:
print "{:10}:{}".format(t,dataType(t))
Output:
1 :BIT
0 :BIT
True :BIT
False :BIT
12 :INT
34l :INT
-3 :INT
03 :INT
1.2 :FLOAT
-20.4 :FLOAT
1e66 :FLOAT
35. :FLOAT
- .2 :FLOAT
-.2e6 :FLOAT
10-1 :TEXT
def :TEXT
10,2 :TEXT
[1,2] :TEXT
35.9.6 :TEXT
35.. :TEXT
. :TEXT
And if you positively MUST have a regex solution, which produces the same results, here it is:
def regDataType(str):
str=str.strip()
if len(str) == 0: return 'BLANK'
if re.match(r'True$|^False$|^0$|^1$', str):
return 'BIT'
if re.match(r'([-+]\s*)?\d+[lL]?$', str):
return 'INT'
if re.match(r'([-+]\s*)?[1-9][0-9]*\.?[0-9]*([Ee][+-]?[0-9]+)?$', str):
return 'FLOAT'
if re.match(r'([-+]\s*)?[0-9]*\.?[0-9][0-9]*([Ee][+-]?[0-9]+)?$', str):
return 'FLOAT'
return 'TEXT'
I cannot recommend the regex over the ast version however; just let Python do the interpretation of what it thinks these data types are rather than interpret them with a regex...
You could also use json.
import json
converted_val = json.loads('32.45')
type(converted_val)
Outputs
type <'float'>
EDIT
To answer your question, however:
re.match()
returns partial matches, starting from the beginning of the string.
Since you keep evaluating every pattern match the sequence for "2010-00-10" goes like this:
if patternTEXT.match(str_obj): #don't use 'string' as a variable name.
it matches, so odp
is set to "text"
then, your script does:
if patternFLOAT.match(str_obj):
no match, odp
still equals "text"
if patternINT.match(str_obj):
partial match odp
is set to "INT"
Because match returns partial matches, multiple if
statements are evaluated and the last one evaluated determines which string is returned in odp
.
You can do one of several things:
rearrange the order of your if statements so that the last one to match is the correct one.
use if
and elif
for the rest of your if
statements so that only the first statement to match is evaluated.
check to make sure the match object is matching the entire string:
...
match = patternINT.match(str_obj)
if match:
if match.end() == match.endpos:
#do stuff
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With