One of my colleagues recently interviewed some candidates for a job and one said they had very good Perl experience.
Since my colleague didn't know Perl, he asked me for a critique of some code written (off-site) by that potential hire, so I had a look and told him my concerns (the main one was that it originally had no comments and it's not like we gave them enough time).
However, the code works so I'm loath to say no-go without some more input. Another concern is that this code basically looks exactly how I'd code it in C. It's been a while since I did Perl (and I didn't do a lot, I'm more a Python bod for quick scripts) but I seem to recall that it was a much more expressive language than what this guy used.
I'm looking for input from real Perl coders, and suggestions for how it could be improved (and why a Perl coder should know that method of improvement).
You can also wax lyrical about whether people who write one language in a totally different language should (or shouldn't be hired). I'm interested in your arguments but this question is primarily for a critique of the code.
The spec was to successfully process a CSV file as follows and output the individual fields:
User ID,Name , Level,Numeric ID
pax, Pax Morgan ,admin,0
gt," Turner, George" rubbish,user,1
ms,"Mark \"X-Men\" Spencer","guest user",2
ab,, "user","3"
The output was to be something like this (the potential hire's code actually output this):
User ID,Name , Level,Numeric ID:
[User ID]
[Name]
[Level]
[Numeric ID]
pax, Pax Morgan ,admin,0:
[pax]
[Pax Morgan]
[admin]
[0]
gt," Turner, George " rubbish,user,1:
[gt]
[ Turner, George ]
[user]
[1]
ms,"Mark \"X-Men\" Spencer","guest user",2:
[ms]
[Mark "X-Men" Spencer]
[guest user]
[2]
ab,, "user","3":
[ab]
[]
[user]
[3]
Here is the code they submitted:
#!/usr/bin/perl
# Open file.
open (IN, "qq.in") || die "Cannot open qq.in";
# Process every line.
while (<IN>) {
chomp;
$line = $_;
print "$line:\n";
# Process every field in line.
while ($line ne "") {
# Skip spaces and start with empty field.
if (substr ($line,0,1) eq " ") {
$line = substr ($line,1);
next;
}
$field = "";
$minlen = 0;
# Detect quoted field or otherwise.
if (substr ($line,0,1) eq "\"") {
$line = substr ($line,1);
$pastquote = 0;
while ($line ne "") {
# Special handling for quotes (\\ and \").
if (length ($line) >= 2) {
if (substr ($line,0,2) eq "\\\"") {
$field = $field . "\"";
$line = substr ($line,2);
next;
}
if (substr ($line,0,2) eq "\\\\") {
$field = $field . "\\";
$line = substr ($line,2);
next;
}
}
# Detect closing quote.
if (($pastquote == 0) && (substr ($line,0,1) eq "\"")) {
$pastquote = 1;
$line = substr ($line,1);
$minlen = length ($field);
next;
}
# Only worry about comma if past closing quote.
if (($pastquote == 1) && (substr ($line,0,1) eq ",")) {
$line = substr ($line,1);
last;
}
$field = $field . substr ($line,0,1);
$line = substr ($line,1);
}
} else {
while ($line ne "") {
if (substr ($line,0,1) eq ",") {
$line = substr ($line,1);
last;
}
if ($pastquote == 0) {
$field = $field . substr ($line,0,1);
}
$line = substr ($line,1);
}
}
# Strip trailing space.
while ($field ne "") {
if (length ($field) == $minlen) {
last;
}
if (substr ($field,length ($field)-1,1) eq " ") {
$field = substr ($field,0, length ($field)-1);
next;
}
last;
}
print " [$field]\n";
}
}
close (IN);
His code is a little verbose. Perl is all about modules, and avoiding them makes your life hard. Here is an equivalent to what you posted that I wrote in about two minutes:
#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV;
my $parser = Text::CSV->new({
allow_whitespace => 1,
escape_char => '\\',
allow_loose_quotes => 1,
});
while(my $line = <>){
$parser->parse($line) or die "Parse error: ". $parser->error_diag;
my @row = $parser->fields;
print $line;
print "\t[$_]\n" for @row;
}
I advise people to never hire Perl programmers, or C programmers, or Java programmers, and so on. Just hire good people. The programmers who I've hired to write Perl were also skilled in various other languages. I hired them because they were good programmers, and good programmers can deal with multiple languages.
Now, that code does look a lot like C, but I think it's fine Perl too. If you're hiring a good programmer, with a little Perl practice under his belt he'll catch up just fine. People are complaining about the lack of regexes, which would make things simpler in ancillary areas, but I wouldn't wish on anyone a regex solution on parsing that dirty CSV data. I wouldn't want to read it or maintain it.
I often find that the reverse problem is more troublesome: hire a good programmer who writes good Perl code, but the rest of the team only knows the basics of Perl and can't keep up. This has nothing to do with poor formatting or bad structure, just a level of skill with advanced topics (e.g. closures).
Things are getting a bit heated in this debate, so I think I should explain more about how I deal with this sort of thing. I don't see this as a regex / no-regex problem. I wouldn't have written the code the way the candidate did, but that doesn't really matter.
I write quite a bit of crappy code too. On the first pass, I'm usually thinking more about structure and process than syntax. I go back later to tighten that up. That doesn't mean that the candidate's code is any good, but for a first pass done in an interview I don't judge it too harshly. I don't know how much time he had to write it and so on, so I don't judge it based on something I would have had a long time to work on. Interview questions are always weird because you can't do what you'd really do for real work. I'd probably fail a question about writing a CSV parser too if I had to start from scratch and do it in 15 minutes. Indeed, I wasted more than that today being a total bonehead with some code.
I went to look at the code for Text::CSV_PP, the Pure Perl cousin to Text::CSV_XS. It uses regular expressions, but a lot of regular expressions that handle special cases, and in structure isn't that different from the code presented here. It's a lot of code, and it's complicated code that I hope I never have to look at again.
What I tend to disfavor are interview answers that only address the given input. That's almost always the wrong thing to do in the real world where you have to handle cases that you may not have discovered yet and you need the flexibility to deal with future issues. I find that missing from a lot of answers on Stackoverflow too. The thought process of the solution is more telling to me. People become skilled at a language more easily than they change how they think about things. I can teach people how to write better Perl, but I can't change their wetware for the most part. That comes from scars and experience.
Since I wasn't there to see the candidate code the solution or ask him follow-up questions, I won't speculate on why he wrote it the way he did. For some of the other solutions I've seen here, I could be equally harsh in an interview.
A career is a journey. I don't expect everyone to be a guru or to have the same experiences. If I write-off people because they don't know some trick or idiom, I'm not giving them the chance to continue their journey. The candidate's code won't win any prizes, but apparently it was enough to get him into the final three for consideration for an offer. The guy got up there and tried, did much better than a lot of code I've seen in my life, and that's good enough for me.
I would argue writing C in Perl is a much better situation than writing Perl in C. As is often brought up on the SO podcast, understanding C is a virtue that not all developers (even some good ones) have nowadays. Hire them and buy a copy of Perl Best Practices for them and you will be set. After best practices a copy of Intermediate Perl and they could work out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With