Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Array of arrays in Perl

I'm new to Perl, but I needed it to get some text out of some awful HTML file. In the code so far, I have got to the point I have extracted all the values I need (I verified it works with data dumper):

For every data record i.e. row of a 2D table they are called:

$org, $gene_name, $number, $motif_num, $pos, $strand, $seq

I have many data entries and each one would be a row, with the above values as the columns.

To do other stuff with them later, I want to make a 2D array structure, so I can loop through each entry (row) and pick out values I need and so on.

I thought the best way of doing this would to use the loop and for each data entry, after extracting the values with regexp matching, combine the values/columns into an array for the individual data record:

my @seidl_array_row = ($org, $gene_name, $number, $motif_num, $pos, $strand, $seq);

Then push this array onto the finished 2D array of arrays:

push @seidl_array, [ @seidl_array_row ];

(@seidl_array was defined with my before the loop.)

So in effect I get a 2D data table, where each element of the array @seidl_array is an array containing the values $org, $gene_name, $number, $motif_num, $pos, $strand, and $seq.

I'm new to Perl, so I don't know if this was the right way to do it programmatically, since I'm having issues when it comes to doing stuff later with this data. I wondered if the issue was with how I constructed the array of arrays in the first place. Examples in my book do it statically with simple data sets, and this is a much larger genomic data gtf file, so doing it statically is not really feasible.

like image 424
Ward9250 Avatar asked Dec 21 '22 07:12

Ward9250


2 Answers

As far as I can see, there is nothing wrong with your approach. Using a reference to the array instead of copying the array, as choroba suggested, has the benefit that the data isn't copied unnecessarily (but remember: that only works if you declare @seidl_array_row inside the loop, otherwise you would just make several references to the same array).

You can have that same advantage by skipping the row array completely like so:

push @seidl_array, [ $org, $gene_name, $number, $motif_num, $pos, $strand, $seq ];

For some extra convenience in accessing the data, I often use arrays of hashes like so:

push @seidl_array, {
    org    => $org,
    name   => $gene_name,
    number => $number,
    motif  => $motif_num,
    pos    => $pos,
    strand => $strand,
    seq    => $seq,
};

This has the advantage that you don't have to remember the positions of the respective values in the array, but can access them by name.

like image 198
tauli Avatar answered Jan 09 '23 12:01

tauli


Your solution seems correct to me. Using [ @seidl_array_row ] creates a copy of the list, if you are correctly declaring the row with my inside the loop, you can store its reference directly to avoid unnecessary copying:

push @seidl_array, \@seidl_array_row;
like image 40
choroba Avatar answered Jan 09 '23 10:01

choroba