I am working on a program that take user input for two file names. Unfortunately, the program can easily break if the user does not follow the specified format of the input. I want to write code that improves its resiliency against these types of errors. You'll understand when you see my code:
# Ask the user for the filename of the qseq file and barcode.txt file
print "Please enter the name of the qseq file and the barcode file separated by a comma:";
# user should enter filenames like this: sample1.qseq, barcode.txt
# remove the newline from the qseq filename
chomp ($filenames = <STDIN>);
# an empty array
my @filenames;
# remove the ',' and put the files into an array separated by spaces; indexes the files
push @filename, join(' ', split(',', $filenames))
# the qseq file
my $qseq_filename = shift @filenames;
# the barcode file.
my barcode = shift @filenames;
Obviously this code runs can run into errors if the user enters the wrong type of filename (.tab file instead of .txt or .seq instead of .qseq). I want code that can do some sort of check to see that the user enters the appropriate file type.
Another error that could break the code is if the user enters too many spaces before the filenames. For example: sample1.qseq,(imagine 6 spaces here) barcode.txt (Notice the numerous spaces after the comma)
Another example: (imagine 6 spaces here) sample1.qseq,barcode.txt (This time notice the number of spaces before the first filename)
I also want lines of code that can remove extra spaces so that the program doesn't break. I think the user input has to be in the following kind of format: sample1.qseq, barcode.txt. The user input has to be in this format so that I can properly index the filenames into an array and shift them out later.
Thanks any help or suggestions are greatly appreciated!
txt USAGE printf "q==<%s> b==<%s>\n", $opt{qseq}, $opt{barcode}; The shell will deal with any extraneous whitespace, try it and see.
The most common regex character to find whitespaces are \s and \s+ . The difference between these regex characters is that \s represents a single whitespace character while \s+ represents multiple whitespaces in a string.
The standard way to deal with this kind of problem is utilising command-line options, not gathering input from STDIN. Getopt::Long comes with Perl and is servicable:
use strict; use warnings FATAL => 'all';
use Getopt::Long qw(GetOptions);
my %opt;
GetOptions(\%opt, 'qseq=s', 'barcode=s') or die;
die <<"USAGE" unless exists $opt{qseq} and $opt{qseq} =~ /^sample\d[.]qseq$/ and exists $opt{barcode} and $opt{barcode} =~ /^barcode.*\.txt$/;
Usage: $0 --qseq sample1.qseq --barcode barcode.txt
$0 -q sample1.qseq -b barcode.txt
USAGE
printf "q==<%s> b==<%s>\n", $opt{qseq}, $opt{barcode};
The shell will deal with any extraneous whitespace, try it and see. You need to do the validation of the file names, I made up something with regex in the example. Employ Pod::Usage for a fancier way to output helpful documentation to your users who are likely to get the invocation wrong.
There are dozens of more advanced Getopt modules on CPAN.
First, put use strict;
at the top of your code and declare your variables.
Second, this:
# remove the ',' and put the files into an array separated by spaces; indexes the files
push @filename, join(' ', split(',', $filenames))
Is not going to do what you want. split() takes a string and turns it into an array. Join takes a list of items and returns a string. You just want to split:
my @filenames = split(',', $filenames);
That will create an array like you expect.
This function will safely trim white space from the beginning and end of a string:
sub trim {
my $string = shift;
$string =~ s/^\s+//;
$string =~ s/\s+$//;
return $string;
}
Access it like this:
my $file = trim(shift @filenames);
Depending on your script, it might be easier to pass the strings as command line arguments. You can access them through the @ARGV array but I prefer to use GetOpt::Long:
use strict;
use Getopt::Long;
Getopt::Long::Configure("bundling");
my ($qseq_filename, $barcode);
GetOptions (
'q|qseq=s' => \$qseq_filename,
'b|bar=s' => \$barcode,
);
You can then call this as:
./script.pl -q sample1.qseq -b barcode.txt
And the variables will be properly populated without a need to worry about trimming white space.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With