Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I tell DBD::CSV to use a comma as the decimal seperator?

Tags:

csv

perl

dbi

I'm trying to use a German-style CSV file with DBI and DBD::CSV. This, in turn, uses Text::CSV to parse the file. I want to query the data in that file using SQL.

Let's look at the file first. It is separated by semicolons (;), and the numbers in it look like this: 5,23, which is equivalent to the English 5.23.

Here's what I've got so far:

use strict; use warnings;
use DBI;

# create the database handle
my $dbh = DBI->connect(
  'dbi:CSV:',
  undef, undef,
  {
    f_dir => '.',
    f_schema => undef,
    f_ext => '.csv',
    f_encoding => 'latin-1',
    csv_eol => "\n",
    csv_sep_char => ';',
    csv_tables => {
      foo => {
        file => 'foo.csv',
        #skip_first_row => 0,
        col_names => [ map { "col$_" } (1..3)  ], # see annotation below
      },
    },
  },
) or croak $DBI::errstr;

my $sth = $dbh->prepare(
  'SELECT col3 FROM foo WHERE col3 > 80.50 ORDER BY col3 ASC'
);
$sth->execute;

while (my $res = $sth->fetchrow_hashref) {
  say $res->{col3};
}

Now, this looks quite nice. The problem is that the SQL (meaning SQL::Statement, which is somewhere down the line from DBI and DBD::CSV) does not regard the data in col3, which is a floating-point value with a comma in the middle, as a float. Instead, it treats the column as an integer, because it doesn't understand the comma.

Here's some example data:

foo;foo;81,90
bar;bar;80,50
baz;baz;80,70

So the above code with this data will result in one line of output: 81,90. Of course, that is wrong. It used the int() part of col3 with the comparison, which is right, but not what I want.

Question: How can I tell it to treat the numbers with the comma as float?

Things I've thought about:

  • I've not found any built-in way in Text::CSV to do this. I'm not sure where in Text::CSV I could hook this in, or if there is a mechanism in Text::CSV to put such things in at all.
  • I don't know if it poses a problem that DBD::CSV wants to use Text::CSV_XS if possible.
  • Maybe I can do it later, after the data has been read and is already stored away somewhere, but I'm not yet sure where the right access point is.
  • I understand that the stuff is stored in SQL::Statement. I don't yet know where. This could be handy somehow.

Changing the source CSV file to have dots instead of commas is not an option.

I'm open for all kinds of suggestions. Other approaches to the whole CSV via SQL thing are welcome, too. Thanks a lot.

like image 287
simbabque Avatar asked Dec 11 '22 20:12

simbabque


1 Answers

You need to write a user-defined function using SQL::Statement::Functions (already loaded as part of DBD::CSV).

This program does what you want. Adding 0.0 to the transformed string is strictly unnecessary, but it makes the point about the purpose of the subroutine. (Note also your typo in the f_encoding parameter to the connect call.)

use strict;
use warnings;

use DBI;

my $dbh = DBI->connect(
  'dbi:CSV:',
  undef, undef,
  {
    f_dir => '.',
    f_schema => undef,
    f_ext => '.csv',
    f_encoding => 'latin-1',
    csv_eol => "\n",
    csv_sep_char => ';',
    csv_tables => {
      foo => {
        file => 'test.csv',
        #skip_first_row => 0,
        col_names => [ map { "col$_" } (1..3)  ], # see annotation below
      },
    },
  },
) or croak $DBI::errstr;

$dbh->do('CREATE FUNCTION comma_float EXTERNAL');

sub comma_float {
  my ($self, $sth, $n) = @_;
  $n =~ tr/,/./;
  return $n + 0.0;
}

my $sth = $dbh->prepare(
  'SELECT col3 FROM foo WHERE comma_float(col3) > 80.50 ORDER BY col3 ASC'
);
$sth->execute;

while (my $res = $sth->fetchrow_hashref) {
  say $res->{col3};
}

output

80,70
81,90
like image 131
Borodin Avatar answered Mar 23 '23 06:03

Borodin