Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find string with special character in text file and add line break before each occurrence

I have a text file that is one long string like this:

ISA*00*GARBAGE~ST*TEST*TEST~CLP*TEST~ST*TEST*TEST~CLP*TEST~ST*TEST*TEST~CLP*TEST~GE*GARBAGE*~   

And I need it to look like this:

~ST*TEST*TEST~CLP*TEST
~ST*TEST*TEST~CLP*TEST
~ST*TEST*TEST~CLP*TEST

I first tried to add a line at every ~ST to split the string up, but I can't for the life of me make this happen. I have tried various scripts, but I thought a find/replace script would work best.

@echo off
setlocal enabledelayedexpansion
set INTEXTFILE=test.txt
set OUTTEXTFILE=test_out.txt
set SEARCHTEXT=~ST
set REPLACETEXT=~ST

for /f "tokens=1,* delims=~" %%A in ( '"type %INTEXTFILE%"') do (
    SET string=%%A
    SET modified=!string:%SEARCHTEXT%=%REPLACETEXT%!

    echo !modified! >> %OUTTEXTFILE%
)
del %INTEXTFILE%
rename %OUTTEXTFILE% %INTEXTFILE%

Found here How to replace substrings in windows batch file

But I'm stuck because (1) the special character ~ makes the code not work at all. It gives me this result:

string:~ST=~ST

The code does nothing at all if using quotes around "~ST". And (2) I can't figure out how to add a line break before ~ST.

The final task for this would be to delete the ISA*00*blahblahblah and ~GE*blahblahblah lines after all splits have been performed. But I am stuck on the splitting at ~ST part.

Any suggestions?

like image 738
AnA Avatar asked Dec 07 '15 08:12

AnA


People also ask

How do you put a line break in a text file?

There is carriage return with the escape sequence \r with hexadecimal code value 0D abbreviated with CR and line-feed with the escape sequence \n with hexadecimal code value 0A abbreviated with LF. Text files on MS-DOS/Windows use CR+LF as newline. Text files on Unix/Linux/MAC (since OS X) use just LF as newline.


1 Answers

@echo off
setlocal EnableDelayedExpansion

rem Set next variable to the number of "~" chars that delimit the wanted fields, or more
set "maxTokens=7"
rem Define the delimiters that starts a new field
set "delims=/ST/GE/"

for /F "delims=" %%a in (test.txt) do (
   set "line=%%a"
   set "field="
   rem Process up to maxTokens per line;
   rem this is a trick to avoid a call to a subroutine that have a goto loop
   for /L %%i in (0,1,%maxTokens%) do if defined line (
      for /F "tokens=1* delims=~" %%b in ("!line!") do (
         rem Get the first token in the line separated by "~" delimiter
         set "token=%%b"
         rem ... and update the rest of the line
         set "line=%%c"
         rem Get the first two chars after "~" token like "ST", "CL" or "GE";
         rem                            if they are "ST" or "GE":
         for %%d in ("!token:~0,2!") do if "!delims:/%%~d/=!" neq "%delims%" (
            rem Start a new field: show previous one, if any
            if defined field echo !field!
            if "%%~d" equ "ST" (
               set "field=~%%b"
            ) else (
               rem It is "GE": cancel rest of line
               set "line="
            )
         ) else (
            rem It is "CL" token: join it to current field, if any
            if defined field set "field=!field!~%%b"
         )
      )
   )
)

Input:

ISA*00*GARBAGE~ST*TEST1*TEST1~CLP*TEST1~ST*TEST2*TEST2~CLP*TEST2~ST*TEST3*TEST3~CLP*TEST3~GE*GARBAGE*~CLP~TESTX

Output:

~ST*TEST1*TEST1~CLP*TEST1
~ST*TEST2*TEST2~CLP*TEST2
~ST*TEST3*TEST3~CLP*TEST3
like image 125
Aacini Avatar answered Sep 22 '22 14:09

Aacini