I have grades.tsv file with three columns that show students' names, subjects and grades:
Liam Mathematics 5
Liam History 6
Liam Geography 8
Liam English 8
Aria Mathematics 8
Aria History 7
Aria Geography 6
Isabella Mathematics 9
Isabella History 4
Isabella Geography 7
Isabella English 5
Isabella Music 8
I wanted to calculate the average grade for each student and add it to a separate column. For this I used two filehandles DATA and OUT opening the same file:
use strict;
use warnings;
# Open file with grades for calculation of average grade for each student
open (DATA,"grades.tsv") or die "Cannot open file\n";
my %grade_sums;
my %num_of_subjects;
# Calculate sum of grades and number of subjects for each student
while( <DATA> ) {
chomp;
my ($name, $subject, $grade) = split /\t/;
$grade_sums{$name} += $grade;
$num_of_subjects{$name} += 1;
}
close DATA;
# Open file with grades again but this time for a purpose of adding a separate column with average grade and printing a result
open (OUT,"grades.tsv") or die "Cannot open file\n";
while ( <OUT> ) {
chomp;
my ($name, $subject, $grade) = split /\t/;
# Calculate average grade
my $average_grade = $grade_sums{$name} / $num_of_subjects{$name};
my $outline = join("\t", $name, $subject, $grade, $average_grade);
# Print a file content with new column
print "$outline\n";
}
close OUT;
The code works but I am not sure if it is a proper way for this task. Is it a good practice or are there better ways that should be preferred?
For the most part, if two mv processes attempt to move the same file at the same time, they'll both copy the data: the instance first to start will create a file, the second instance will delete that file and create a new one. However, if you're unlucky, it is possible to lose data.
If another process opens the same file, then another set of file pointers are created. Ever opened file has its own file pointers. Since the file pointers are not shared, two processes can write to the same area of a file, the last process to write will win.
Pointers to the Open File Table: One process can have multiple file descriptors point to the same entry (e.g., as a result of a call to dup() ) Multiple processes (e.g., a parent and child) can have file descriptors that point to the same entry.
Reopening the file is fine. One alternative would be to seek to the start of the file.
use Fcntl qw( SEEK_SET );
seek(DATA, 0, SEEK_SET);
Seeking is more efficient because it doesn't have to check for permissions, etc. It also guarantees that you get the same file (but not that noone changed it).
The other alternative would be to load the entire file into memory. That's what I'd usually do.
Note that
open(FH, $qfn) or die "Cannot open file\n";
is better written as
open(my $FH, '<', $qfn)
or die("Can't open file \"$qfn\": $!\n");
open
avoids some problems.DATA
as Perl sometimes creates a handle by that name automatically.FH
) in favour or lexical variables (my $FH
).There's another thing to consider in this sort of operation. What do you do if you mess up when writing the new data? How are you going to tolerate a program that truncates the original file but fails to completely write the new data?
Instead of opening the write filehandle on the same filename, use a temporary file. File::Temp is part of the Standard Library:
use File::Temp;
my( $temp_fh, $tempfile ) = tempfile();
Now, write everything to $temp_fh
until you are satisfied that you were able to complete the output. After that, use rename
to move the completed file into place:
rename $tempfile => $original;
Shawn also correctly points out that this will change the inode, thus breaking hard links. You can copy the new file into the old if that matters to you, but I've rarely seen a situation where the technology was that advanced :)
If you mess up, the original data are still there and you can try again. Note: this assumes that the two files are on the same partition since that's a requirement of rename
.
Although this might not matter in your case, you also have to consider what other consumers do in the time it takes you to write the new file. If another program wants to read the original file immediately after you've truncated it but haven't written the data (or incompletely written it), what happens? Typically, you want to ensure the file is complete before it's available to other programs.
If you don't like the temp file, there are other ways to handle the problem. Move the original file to a backup name then read that and write to the original name. Or, write to a different filename and move it into place. See, for example, Perl's adjustments to the -i
command-line switch for just this problem.
Sample code for student's report card
#!/usr/bin/perl
#
# USAGE:
# prog.pl
#
# Description:
# Demonstration code for StackOverflow Q59991322
#
# StackOverflow:
# Question 59991322
#
# Author:
# Polar Bear https://stackoverflow.com/users/12313309/polar-bear
#
# Date: Tue Jan 30 13:37:00 PST 2020
#
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my $debug = 0; # debug flag
my %hash;
my $student;
my ($subject,$mark);
map{
chomp;
my($name,$subject,$mark) = split "\t",$_;
$hash{$name}{subjects}{$subject} = $mark;
$hash{$name}{compute}{Total} += $mark;
$hash{$name}{compute}{Num_subjects}++;
} <DATA>;
say Dumper(\%hash) if $debug;
foreach $student ( sort keys %hash ) {
$hash{$student}{compute}{GPA} = $hash{$student}{compute}{Total}/$hash{$student}{compute}{Num_subjects};
$~ = 'STDOUT_REPORT';
write;
print_marks($student);
$~ = 'STDOUT_REPORT_END';
write;
}
sub print_marks {
my $student = shift;
$~ = 'STDOUT_MARKS';
while( ($subject,$mark) = each %{$hash{$student}{subjects}} ) {
write;
}
}
format STDOUT_REPORT =
+----------------------------+
| Student: @<<<<<<<<<< |
$student
+----------------------------+
.
format STDOUT_REPORT_END =
+----------------------------+
| Subjects taken: @<< |
$hash{$student}{compute}{Num_subjects}
| Grade average: @<< |
$hash{$student}{compute}{GPA}
+----------------------------+
.
format STDOUT_MARKS =
| @<<<<<<<<<<<<<< @<< |
$subject, $mark
.
__DATA__
Liam Mathematics 5
Liam History 6
Liam Geography 8
Liam English 8
Aria Mathematics 8
Aria History 7
Aria Geography 6
Isabella Mathematics 9
Isabella History 4
Isabella Geography 7
Isabella English 5
Isabella Music 8
Output
+----------------------------+
| Student: Aria |
+----------------------------+
| Mathematics 8 |
| History 7 |
| Geography 6 |
+----------------------------+
| Subjects taken: 3 |
| Grade average: 7 |
+----------------------------+
+----------------------------+
| Student: Isabella |
+----------------------------+
| Music 8 |
| Mathematics 9 |
| History 4 |
| English 5 |
| Geography 7 |
+----------------------------+
| Subjects taken: 5 |
| Grade average: 6.6 |
+----------------------------+
+----------------------------+
| Student: Liam |
+----------------------------+
| Geography 8 |
| English 8 |
| History 6 |
| Mathematics 5 |
+----------------------------+
| Subjects taken: 4 |
| Grade average: 6.7 |
+----------------------------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With