I have a lot of text files with fixed-width fields:
<c> <c> <c>
Dave Thomas 123 Main
Dan Anderson 456 Center
Wilma Rainbow 789 Street
The rest of the files are in a similar format, where the <c>
will mark the beginning of a column, but they have various (unknown) column & space widths. What's the best way to parse these files?
I tried using Text::CSV
, but since there's no delimiter it's hard to get a consistent result (unless I'm using the module wrong):
my $csv = Text::CSV->new();
$csv->sep_char (' ');
while (<FILE>){
if ($csv->parse($_)) {
my @columns=$csv->fields();
print $columns[1] . "\n";
}
}
Data in a fixed-width text file is arranged in rows and columns, with one entry per row. Each column has a fixed width, specified in characters, which determines the maximum amount of data it can contain.
You can convert a fixed-width file to a CSV using Python pandas by reading the fixed-width file as a DataFrame df using pd. read('my_file. fwf') and writing the DataFrame to a CSV using df. to_csv('my_file.
Fixed-width is a file format where data is arranged in columns, but instead of those columns being delimited by a certain character (as they are in CSV) every row is the exact same length. The application reading the file must know how long each column is.
As user604939 mentions, unpack
is the tool to use for fixed width fields. However, unpack
needs to be passed a template to work with. Since you say your fields can change width, the solution is to build this template from the first line of your file:
my @template = map {'A'.length} # convert each to 'A##'
<DATA> =~ /(\S+\s*)/g; # split first line into segments
$template[-1] = 'A*'; # set the last segment to be slurpy
my $template = "@template";
print "template: $template\n";
my @data;
while (<DATA>) {
push @data, [unpack $template, $_]
}
use Data::Dumper;
print Dumper \@data;
__DATA__
<c> <c> <c>
Dave Thomas 123 Main
Dan Anderson 456 Center
Wilma Rainbow 789 Street
which prints:
template: A8 A10 A* $VAR1 = [ [ 'Dave', 'Thomas', '123 Main' ], [ 'Dan', 'Anderson', '456 Center' ], [ 'Wilma', 'Rainbow', '789 Street' ] ];
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With