Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I use awk or Perl to increment a number in a large XML file?

Tags:

awk

perl

I have an XML file with the following line:

            <VALUE DECIMAL_VALUE="0.2725" UNIT_TYPE="percent"/>

I would like to increment this value by .04 and keep the format of the XML in place. I know this is possible with a Perl or awk script, but I am having difficulty with the expressions to isolate the number.

like image 902
DC. Avatar asked Jan 15 '09 20:01

DC.


3 Answers

If you're on a box with the xsltproc command in place I would suggest you use XSLT for this.

For a Perl solution I'd go for using the DOM. Check this DOM Processing with Perl article out.

That said. If your XML file is produced in a predictable way something naïve like the following could work:

perl -pe 's#(<VALUE DECIMAL_VALUE=")([0-9.]+)(" UNIT_TYPE="percent"/>)#"$1" . ($2 + 0.4) . "$3"#e;'
like image 54
PEZ Avatar answered Sep 22 '22 22:09

PEZ


If you are absolutely sure that the format of your XML will never change, that the order of the attributes is fixed, that you can indeed get the regexp for the number right... then go for the non-parser based solution.

Personally I would use XML::Twig (maybe because I wrote it ;--). It will process the XML as XML, while still respecting the original format of the file, and won't load it all in memory before starting to work.

Untested code below:

#!/usr/bin/perl
use strict;
use warnings;

use XML::Twig;

XML::Twig->new( # call the sub for each VALUE element with a DECIMAL_VALUE attribute
                twig_roots => { 'VALUE[@DECIMAL_VALUE]' => \&upd_decimal },
                # print anything else as is
                twig_print_outside_roots => 1,
              )
         ->parsefile_inplace( 'foo.xml');

sub upd_decimal
  { my( $twig, $value)= @_; # twig is the XML::Twig object, $value the element
    my $decimal_value= $value->att( 'DECIMAL_VALUE');
    $decimal_value += 0.4;
    $value->set_att( DECIMAL_VALUE => $decimal_value);
    $value->print;
  }
like image 32
mirod Avatar answered Sep 23 '22 22:09

mirod


This takes input on stdin, outputs to stdout:

while(<>){
 if( $_ =~ /^(.*DECIMAL_VALUE=\")(.*)(\".*)$/ ){
  $newVal = $2 + 0.04;
  print "$1$newVal$3\n";
 }else{
  print $_;
 }
}
like image 26
Paul Avatar answered Sep 22 '22 22:09

Paul