Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiple filehandles opening the same file - is it a good practice?

I have grades.tsv file with three columns that show students' names, subjects and grades:

Liam    Mathematics 5
Liam    History 6
Liam    Geography   8
Liam    English 8
Aria    Mathematics 8
Aria    History 7
Aria    Geography   6
Isabella    Mathematics 9
Isabella    History 4
Isabella    Geography   7
Isabella    English 5
Isabella    Music   8

I wanted to calculate the average grade for each student and add it to a separate column. For this I used two filehandles DATA and OUT opening the same file:

use strict;
use warnings;

# Open file with grades for calculation of average grade for each student
open (DATA,"grades.tsv") or die "Cannot open file\n";

my %grade_sums;
my %num_of_subjects;

# Calculate sum of grades and number of subjects for each student
while( <DATA> ) {

   chomp;
   my ($name, $subject, $grade) = split /\t/;

   $grade_sums{$name} += $grade;
   $num_of_subjects{$name} += 1;
}

close DATA;


# Open file with grades again but this time for a purpose of adding a separate column with average grade and printing a result
open (OUT,"grades.tsv") or die "Cannot open file\n";

while ( <OUT> ) {
   chomp;
   my ($name, $subject, $grade) = split /\t/;

   # Calculate average grade
   my $average_grade = $grade_sums{$name} / $num_of_subjects{$name};
   my $outline = join("\t", $name, $subject, $grade, $average_grade);

   # Print a file content with new column
   print "$outline\n";
}

close OUT;

The code works but I am not sure if it is a proper way for this task. Is it a good practice or are there better ways that should be preferred?

like image 603
zubenel Avatar asked Jan 30 '20 17:01

zubenel


People also ask

Can a multiple processes use the same file?

For the most part, if two mv processes attempt to move the same file at the same time, they'll both copy the data: the instance first to start will create a file, the second instance will delete that file and create a new one. However, if you're unlucky, it is possible to lose data.

What happens if two processes write to the same file?

If another process opens the same file, then another set of file pointers are created. Ever opened file has its own file pointers. Since the file pointers are not shared, two processes can write to the same area of a file, the last process to write will win.

Can one file have multiple file descriptors?

Pointers to the Open File Table: One process can have multiple file descriptors point to the same entry (e.g., as a result of a call to dup() ) Multiple processes (e.g., a parent and child) can have file descriptors that point to the same entry.


3 Answers

Reopening the file is fine. One alternative would be to seek to the start of the file.

use Fcntl qw( SEEK_SET );

seek(DATA, 0, SEEK_SET);

Seeking is more efficient because it doesn't have to check for permissions, etc. It also guarantees that you get the same file (but not that noone changed it).

The other alternative would be to load the entire file into memory. That's what I'd usually do.


Note that

open(FH, $qfn) or die "Cannot open file\n";

is better written as

open(my $FH, '<', $qfn)
   or die("Can't open file \"$qfn\": $!\n");
  • Three-arg open avoids some problems.
  • Including the error reason in the error message is beneficial.
  • Including the path in the error message is beneficial.
  • One should avoid DATA as Perl sometimes creates a handle by that name automatically.
  • One should avoid using global variables (e.g. FH) in favour or lexical variables (my $FH).
like image 98
ikegami Avatar answered Nov 09 '22 14:11

ikegami


There's another thing to consider in this sort of operation. What do you do if you mess up when writing the new data? How are you going to tolerate a program that truncates the original file but fails to completely write the new data?

Instead of opening the write filehandle on the same filename, use a temporary file. File::Temp is part of the Standard Library:

use File::Temp;
my( $temp_fh, $tempfile ) = tempfile();

Now, write everything to $temp_fh until you are satisfied that you were able to complete the output. After that, use rename to move the completed file into place:

rename $tempfile => $original;

Shawn also correctly points out that this will change the inode, thus breaking hard links. You can copy the new file into the old if that matters to you, but I've rarely seen a situation where the technology was that advanced :)

If you mess up, the original data are still there and you can try again. Note: this assumes that the two files are on the same partition since that's a requirement of rename.

Although this might not matter in your case, you also have to consider what other consumers do in the time it takes you to write the new file. If another program wants to read the original file immediately after you've truncated it but haven't written the data (or incompletely written it), what happens? Typically, you want to ensure the file is complete before it's available to other programs.

If you don't like the temp file, there are other ways to handle the problem. Move the original file to a backup name then read that and write to the original name. Or, write to a different filename and move it into place. See, for example, Perl's adjustments to the -i command-line switch for just this problem.

like image 4
brian d foy Avatar answered Nov 09 '22 16:11

brian d foy


Sample code for student's report card

#!/usr/bin/perl
#
# USAGE:
#   prog.pl
#
# Description:
#   Demonstration code for StackOverflow Q59991322
#
# StackOverflow: 
#   Question 59991322
#
# Author:
#   Polar Bear    https://stackoverflow.com/users/12313309/polar-bear
#
# Date: Tue Jan 30 13:37:00 PST 2020
#

use strict;
use warnings;
use feature 'say';

use Data::Dumper;

my $debug = 0;      # debug flag
my %hash;
my $student;
my ($subject,$mark);

map{
    chomp;
    my($name,$subject,$mark) = split "\t",$_;
    $hash{$name}{subjects}{$subject} = $mark;
    $hash{$name}{compute}{Total} += $mark;
    $hash{$name}{compute}{Num_subjects}++;
} <DATA>;

say Dumper(\%hash) if $debug;

foreach $student ( sort keys %hash ) {
    $hash{$student}{compute}{GPA} = $hash{$student}{compute}{Total}/$hash{$student}{compute}{Num_subjects};
    $~ = 'STDOUT_REPORT';
    write;
    print_marks($student);
    $~ = 'STDOUT_REPORT_END';
    write;
}

sub print_marks {
    my $student = shift;

    $~ = 'STDOUT_MARKS';

    while( ($subject,$mark) = each %{$hash{$student}{subjects}} ) {
        write;
    }
}

format STDOUT_REPORT = 
+----------------------------+
| Student: @<<<<<<<<<<       |
$student
+----------------------------+
.

format STDOUT_REPORT_END =
+----------------------------+
| Subjects taken:     @<<    |
$hash{$student}{compute}{Num_subjects}
| Grade average:      @<<    |
$hash{$student}{compute}{GPA}
+----------------------------+

.

format STDOUT_MARKS =
| @<<<<<<<<<<<<<<     @<<    |
$subject, $mark
.

__DATA__
Liam    Mathematics 5
Liam    History 6
Liam    Geography   8
Liam    English 8
Aria    Mathematics 8
Aria    History 7
Aria    Geography   6
Isabella    Mathematics 9
Isabella    History 4
Isabella    Geography   7
Isabella    English 5
Isabella    Music   8

Output

+----------------------------+
| Student: Aria              |
+----------------------------+
| Mathematics         8      |
| History             7      |
| Geography           6      |
+----------------------------+
| Subjects taken:     3      |
| Grade average:      7      |
+----------------------------+

+----------------------------+
| Student: Isabella          |
+----------------------------+
| Music               8      |
| Mathematics         9      |
| History             4      |
| English             5      |
| Geography           7      |
+----------------------------+
| Subjects taken:     5      |
| Grade average:      6.6    |
+----------------------------+

+----------------------------+
| Student: Liam              |
+----------------------------+
| Geography           8      |
| English             8      |
| History             6      |
| Mathematics         5      |
+----------------------------+
| Subjects taken:     4      |
| Grade average:      6.7    |
+----------------------------+
like image 3
Polar Bear Avatar answered Nov 09 '22 14:11

Polar Bear