Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge line with the next line if last character is a semicolon using batch file

Tags:

batch-file

I have a file with following 4 lines.

A;1;abc;<xml/>;
;2;def;<xml
>hello world</xml>;
;3;ghi;<xml/>;

Using the batch file, I need to combine lines such that if the line doesn't end end with a semicolon (;), combine the next line into the current line.

So the desired output should be

A;1;abc;<xml/>;
;2;def;<xml>hello world</xml>;
;3;ghi;<xml/>;

I am not very familiar with batch scripts but tried using for /F but no luck so far.

As I understand, the logic should be to check the last character for each line, if it is not a semicolon, read the next line into current line.

Further to this, I managed to get the last character of the line but my script only reads the line if it doesn't being with ; . Any ideas?

@echo off
for /f "tokens=*" %%i in (myfile.txt) do (
  set var=%%i
  echo %%i
  if "%var:~-1%"==";" (
    echo test
  )
)

Note: the above query only reads line 1 and 3.

like image 608
Junaid Avatar asked Jan 11 '13 11:01

Junaid


1 Answers

You have a number of problems with your code :)

1) As you have stated, your code ignores lines that begin with ; - This is due to the default FOR /F EOL option. But your code also strips leading spaces from each line because of "TOKENS=*". You need to set both EOL and DELIMS to nothing. The syntax is weird, but it works:

for /f delims^=^ eol^= %%i ...

2) You attempt to set and expand var within a parenthesized block of code. This cannot work because expansion occurs when the line is parsed, and the entire block of code is parsed at once. So the value of %var% is the value that existed prior to the loop executing. Of course not what you want. The solution is to use delayed expansion. Type FOR /? from a command prompt for more information about delayed expansion (about half way down the help listing)

3) For variable content containing ! will be corrupted if it is expanded when delayed expansion is enabled. The solution is to toggle delayed expansion on and off as needed within the loop. But that causes a complication because you need the value of the growing line to be preserved across the ENDLOCAL barrier. I use a FOR /F to transport the value across the barrier.

Here is a complete batch script that should do the job. It is limited in that it cannot process lines that are greater than the max length of ~8191 bytes.

This code has been re-written to fix a significant bug

@echo off
setlocal disableDelayedExpansion
set "ln="
set "print=0"
for /f delims^=^ eol^= %%i in (myfile.txt) do (
  set "var=%%i"
  setlocal enableDelayedExpansion
  for /f delims^=^ eol^= %%A in ("!ln!!var!") do (
    if "!var:~-1!"==";" (
      endlocal
      echo %%A
      set "ln="
    ) else (
      endlocal
      set "ln=%%A"
    )
  )
)

SET /P solution

There is a much simpler solution that prints each line immediately so that you don't have to worry about transporting a variable across ENDLOCAL. Lines that don't end with ; are printed without newlines using SET /P.

This solution has the following limitations:

1) Lines printed via SET /P will have leading spaces stripped. This limitation is only for Vista and newer versions of Windows. It is not a problem on XP.

2) Thanks to David Ruhmann, I now know that SET /P will fail if the line begins with =. Very unfortunate :(

@echo off
setlocal disableDelayedExpansion
set "ln="
for /f delims^=^ eol^= %%i in (myfile.txt) do (
  set "var=%%i"
  setlocal enableDelayedExpansion
  if "!var:~-1!"==";" (echo !var!) else (<nul set /p ="!var!")
  endlocal
)

hybrid batch/JScript regex solution (bullet proof?)

I've written a hybrid batch/JScript REPL.BAT utility that allows for easy regex search and replace on file contents. It makes the job really easy.

The following command should work on any input, without limitations. It has been updated to support both Windows and Unix style lines. And it is much faster than a pure batch solution.

findstr "^." myfile.txt|repl "([^;\r])\r?\n" "$1" m >"outFile.txt"

Here is the REPL.BAT utility. Full documentation is embedded within the script.

@if (@X)==(@Y) @end /* Harmless hybrid line that begins a JScript comment

::************ Documentation ***********
:::
:::REPL  Search  Replace  [Options  [SourceVar]]
:::REPL  /?
:::
:::  Performs a global search and replace operation on each line of input from
:::  stdin and prints the result to stdout.
:::
:::  Each parameter may be optionally enclosed by double quotes. The double
:::  quotes are not considered part of the argument. The quotes are required
:::  if the parameter contains a batch token delimiter like space, tab, comma,
:::  semicolon. The quotes should also be used if the argument contains a
:::  batch special character like &, |, etc. so that the special character
:::  does not need to be escaped with ^.
:::
:::  If called with a single argument of /? then prints help documentation
:::  to stdout.
:::
:::  Search  - By default this is a case sensitive JScript (ECMA) regular
:::            expression expressed as a string.
:::
:::            JScript syntax documentation is available at
:::            http://msdn.microsoft.com/en-us/library/ae5bf541(v=vs.80).aspx
:::
:::  Replace - By default this is the string to be used as a replacement for
:::            each found search expression. Full support is provided for
:::            substituion patterns available to the JScript replace method.
:::            A $ literal can be escaped as $$. An empty replacement string
:::            must be represented as "".
:::
:::            Replace substitution pattern syntax is documented at
:::            http://msdn.microsoft.com/en-US/library/efy6s3e6(v=vs.80).aspx
:::
:::  Options - An optional string of characters used to alter the behavior
:::            of REPL. The option characters are case insensitive, and may
:::            appear in any order.
:::
:::            I - Makes the search case-insensitive.
:::
:::            L - The Search is treated as a string literal instead of a
:::                regular expression. Also, all $ found in Replace are
:::                treated as $ literals.
:::
:::            E - Search and Replace represent the name of environment
:::                variables that contain the respective values. An undefined
:::                variable is treated as an empty string.
:::
:::            M - Multi-line mode. The entire contents of stdin is read and
:::                processed in one pass instead of line by line. ^ anchors
:::                the beginning of a line and $ anchors the end of a line.
:::
:::            X - Enables extended substitution pattern syntax with support
:::                for the following escape sequences:
:::
:::                \\     -  Backslash
:::                \b     -  Backspace
:::                \f     -  Formfeed
:::                \n     -  Newline
:::                \r     -  Carriage Return
:::                \t     -  Horizontal Tab
:::                \v     -  Vertical Tab
:::                \xnn   -  Ascii (Latin 1) character expressed as 2 hex digits
:::                \unnnn -  Unicode character expressed as 4 hex digits
:::
:::                Escape sequences are supported even when the L option is used.
:::
:::            S - The source is read from an environment variable instead of
:::                from stdin. The name of the source environment variable is
:::                specified in the next argument after the option string.
:::

::************ Batch portion ***********
@echo off
if .%2 equ . (
  if "%~1" equ "/?" (
    findstr "^:::" "%~f0" | cscript //E:JScript //nologo "%~f0" "^:::" ""
    exit /b 0
  ) else (
    call :err "Insufficient arguments"
    exit /b 1
  )
)
echo(%~3|findstr /i "[^SMILEX]" >nul && (
  call :err "Invalid option(s)"
  exit /b 1
)
cscript //E:JScript //nologo "%~f0" %*
exit /b 0

:err
>&2 echo ERROR: %~1. Use REPL /? to get help.
exit /b

************* JScript portion **********/
var env=WScript.CreateObject("WScript.Shell").Environment("Process");
var args=WScript.Arguments;
var search=args.Item(0);
var replace=args.Item(1);
var options="g";
if (args.length>2) {
  options+=args.Item(2).toLowerCase();
}
var multi=(options.indexOf("m")>=0);
var srcVar=(options.indexOf("s")>=0);
if (srcVar) {
  options=options.replace(/s/g,"");
}
if (options.indexOf("e")>=0) {
  options=options.replace(/e/g,"");
  search=env(search);
  replace=env(replace);
}
if (options.indexOf("l")>=0) {
  options=options.replace(/l/g,"");
  search=search.replace(/([.^$*+?()[{\\|])/g,"\\$1");
  replace=replace.replace(/\$/g,"$$$$");
}
if (options.indexOf("x")>=0) {
  options=options.replace(/x/g,"");
  replace=replace.replace(/\\\\/g,"\\B");
  replace=replace.replace(/\\b/g,"\b");
  replace=replace.replace(/\\f/g,"\f");
  replace=replace.replace(/\\n/g,"\n");
  replace=replace.replace(/\\r/g,"\r");
  replace=replace.replace(/\\t/g,"\t");
  replace=replace.replace(/\\v/g,"\v");
  replace=replace.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g,
    function($0,$1,$2){
      return String.fromCharCode(parseInt("0x"+$0.substring(2)));
    }
  );
  replace=replace.replace(/\\B/g,"\\");
}
var search=new RegExp(search,options);

if (srcVar) {
  WScript.Stdout.Write(env(args.Item(3)).replace(search,replace));
} else {
  while (!WScript.StdIn.AtEndOfStream) {
    if (multi) {
      WScript.Stdout.Write(WScript.StdIn.ReadAll().replace(search,replace));
    } else {
      WScript.Stdout.WriteLine(WScript.StdIn.ReadLine().replace(search,replace));
    }
  }
}
like image 83
dbenham Avatar answered Sep 30 '22 05:09

dbenham