Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find type of the content (number, date, time, string etc.) inside a string

I'm trying to parse a CSV file and automatically create a table for it using SQL commands. The first line in the CSV gives the column headers. But I need to infer the column type for each one.

Is there any function in Ruby that would find the type of the content in each field. For example, the CSV line:

"12012", "Test", "1233.22", "12:21:22", "10/10/2009"

should produce the types like

['integer', 'string', 'float', 'time', 'date']

Thanks!

like image 467
Jasim Avatar asked Sep 12 '09 18:09

Jasim


2 Answers

require 'time'

def to_something(str)
  if (num = Integer(str) rescue Float(str) rescue nil)
    num
  elsif (tm = Time.parse(str)) == Time.now
    # Time.parse does not raise an error for invalid input
    str
  else 
    tm
  end
end

%w{12012 1233.22 12:21:22 10/10/2009 Test}.each do |str|
  something = to_something(str)
  p [str, something, something.class]
end

Results in

["12012", 12012, Fixnum]
["1233.22", 1233.22, Float]
["12:21:22", Sat Sep 12 12:21:22 -0400 2009, Time]
["10/10/2009", Sat Oct 10 00:00:00 -0400 2009, Time]
["Test", "Test", String]

Update for ruby 1.9.3: the Time class in the stdlib now does throw an exception if it can't parse the string, so:

def to_something(str)
  duck = (Integer(str) rescue Float(str) rescue Time.parse(str) rescue nil)
  duck.nil? ? str : duck
end
like image 159
glenn jackman Avatar answered Sep 29 '22 21:09

glenn jackman


This might get you started

I don't have a complete solution, but this may help get you started. You can go from an example record to an array of Class objects to a string representation automatically, at least for some types, and then translate the strings...

$ irb
>> t = { "String" => "string", "Fixnum" => "integer", "Float" => "float" }
=> {"Float"=>"float", "Fixnum"=>"integer", "String"=>"string"}
>> ["xyz", 123, 123.455].map { |x| t[x.class.to_s] }
=> ["string", "integer", "float"]

You could map the classes directly, actually:

$ irb
>> t = { String => "string", Fixnum => "integer", Float => "float" }
=> {String=>"string", Float=>"float", Fixnum=>"integer"}
>> ["xyz", 123, 123.455].map { |x| t[x.class] }
=> ["string", "integer", "float"]
like image 28
DigitalRoss Avatar answered Sep 29 '22 21:09

DigitalRoss