Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Method for guessing type of data represented currently represented as strings

I'm currently parsing CSV tables and need to discover the "data types" of the columns. I don't know the exact format of the values. Obviously, everything that the CSV parser outputs is a string. The data types I am currently interested in are:

  1. integer
  2. floating point
  3. date
  4. boolean
  5. string

My current thoughts are to test a sample of rows (maybe several hundred?) in order to determine the types of data present through pattern matching.

I am particularly concerned about the date data type - is their a python module for parsing common date idioms (obviously I will not be able to detect them all)?

What about integers and floats?

like image 304
fmark Avatar asked Oct 14 '25 14:10

fmark


2 Answers

ast.literal_eval() can get the easy ones.

like image 132
Ignacio Vazquez-Abrams Avatar answered Oct 17 '25 23:10

Ignacio Vazquez-Abrams


Dateutil comes to mind for parsing dates.

For integers and floats you could always try a cast in a try/except section

>>> f = "2.5"
>>> i = "9"
>>> ci = int(i)
>>> ci
9
>>> cf = float(f)
>>> cf
2.5
>>> g = "dsa"
>>> cg = float(g)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for float(): dsa
>>> try:
...   cg = float(g)
... except:
...   print "g is not a float"
...
g is not a float
>>>
like image 41
Vinko Vrsalovic Avatar answered Oct 18 '25 01:10

Vinko Vrsalovic