I'm new to Perl, but I needed it to get some text out of some awful HTML file. In the code so far, I have got to the point I have extracted all the values I need (I verified it works with data dumper):
For every data record i.e. row of a 2D table they are called:
$org, $gene_name, $number, $motif_num, $pos, $strand, $seq
I have many data entries and each one would be a row, with the above values as the columns.
To do other stuff with them later, I want to make a 2D array structure, so I can loop through each entry (row) and pick out values I need and so on.
I thought the best way of doing this would to use the loop and for each data entry, after extracting the values with regexp matching, combine the values/columns into an array for the individual data record:
my @seidl_array_row = ($org, $gene_name, $number, $motif_num, $pos, $strand, $seq);
Then push this array onto the finished 2D array of arrays:
push @seidl_array, [ @seidl_array_row ];
(@seidl_array was defined with my
before the loop.)
So in effect I get a 2D data table, where each element of the array @seidl_array is an array containing the values $org, $gene_name, $number, $motif_num, $pos, $strand, and $seq.
I'm new to Perl, so I don't know if this was the right way to do it programmatically, since I'm having issues when it comes to doing stuff later with this data. I wondered if the issue was with how I constructed the array of arrays in the first place. Examples in my book do it statically with simple data sets, and this is a much larger genomic data gtf file, so doing it statically is not really feasible.
As far as I can see, there is nothing wrong with your approach. Using a reference to the array instead of copying the array, as choroba suggested, has the benefit that the data isn't copied unnecessarily (but remember: that only works if you declare @seidl_array_row inside the loop, otherwise you would just make several references to the same array).
You can have that same advantage by skipping the row array completely like so:
push @seidl_array, [ $org, $gene_name, $number, $motif_num, $pos, $strand, $seq ];
For some extra convenience in accessing the data, I often use arrays of hashes like so:
push @seidl_array, {
org => $org,
name => $gene_name,
number => $number,
motif => $motif_num,
pos => $pos,
strand => $strand,
seq => $seq,
};
This has the advantage that you don't have to remember the positions of the respective values in the array, but can access them by name.
Your solution seems correct to me. Using [ @seidl_array_row ]
creates a copy of the list, if you are correctly declaring the row with my
inside the loop, you can store its reference directly to avoid unnecessary copying:
push @seidl_array, \@seidl_array_row;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With