I have a large txt file made of thousand of articles and I am trying to split it into individual files - one for each of the articles that I'd like to save as article_1, article_2 etc.. Each articles begins by a line containing the word /DOCUMENTS/. I am totally new to perl and any insight would be so great ! (even advice on good doc websites). Thanks a lot. So far what I have tried look like:
#!/usr/bin/perl
use warnings;
use strict;
my $id = 0;
my $source = "2010_FTOL_GRbis.txt";
my $destination = "file$id.txt";
open IN, $source or die "can t read $source: $!\n";
while (<IN>)
{
{
open OUT, ">$destination" or die "can t write $destination: $!\n";
if (/DOCUMENTS/)
{
close OUT ;
$id++;
}
}
}
close IN;
Let's say that /DOCUMENTS/ appears by itself on a line. Thus you can make that the record separator.
use English qw<$RS>;
use File::Slurp qw<write_file>;
my $id = 0;
my $source = "2010_FTOL_GRbis.txt";
{ local $RS = "\n/DOCUMENTS/\n";
open my $in, $source or die "can t read $source: $!\n";
while ( <$in> ) {
chomp; # removes the line "\n/DOCUMENTS/\n"
write_file( 'file' . ( ++$id ) . '.txt', $_ );
}
# being scoped by the surrounding brackets (my "local block"),
close $in; # an explicit close is not necessary
}
NOTES:
use English declares the global variable $RS. The "messy name" for it is $/. See perldoc perlvar '/DOCUMENTS/' all by itself on a line, I specified newline + '/DOCUMENTS/' + newline. If this is part of a path that occurs somewhere on the line, then that particular value will not work for the record separator.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With