Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Batch script to merge files without Hex char 1A at the end

Tags:

batch-file

I'm merging two ASCII files via a simple batch script like this

COPY a.txt+b.txt c.txt /y /a

The problem is, the very last character in C gets set to 1A, the HEX notation for SUB. c.txt is fed into another executable which does not like the 1A at the end.

After c.txt is generated, if I open it up in Notepad++ and remove the last character, the file works fine.

How could I merge the a.txt and b.txt without 1A getting appended to the end of c.txt ?

like image 750
xbonez Avatar asked Mar 14 '12 10:03

xbonez


5 Answers

The placing of /a and /b switches is critical. They perform differently depending on whether they are placed after the source filename(s) or the target filename.

When used with a target filename, /a causes the end-of-file marker (ASCII 26) to be added. You are actually specifying this!

When used with the source filename,

/a specifies the file is ASCII and it's copied up to but not including the first ASCII 26 end-of-file mark. That character and anything after it is ignored.

/b causes the entire file to be copied, including any end-of-file markers and anything after them.

When used with the destination filename,

/a causes ASCII 26 to be added as the last character.

/b does not add ASCII 26 as the last character.

Your solution

...although I haven't tested it, is probably to use

COPY a.txt+b.txt /a c.txt /b /y

like image 88
Andrew Leach Avatar answered Nov 02 '22 10:11

Andrew Leach


https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/copy

If /a precedes or follows a list of files on the command line, it applies to all files listed until copy encounters /b. In this case, /b applies to the file preceding /b.

The effect of /a depends on its position in the command-line string: - If /a follows source, the copy command treats the file as an ASCII file and copies data that precedes the first end-of-file character (CTRL+Z). - If /a follows destination, the copy command adds an end-of-file character (CTRL+Z) as the last character of the file.

If /b directs the command interpreter to read the number of bytes specified by the file size in the directory. /b is the default value for copy, unless copy combines files.

If /b precedes or follows a list of files on the command line, it applies to all listed files until copy encounters /a. In this case, /a applies to the file preceding /a.

This is a very long winded way of saying the following: The default when combining files is /a. This means that the /a option is redundant in your code snippet, and would have applied regardless of where the /a was placed.

The solution is to use /b, this instructs it to ignore the #1A [DOS end of file] character when reading, and to not output it on writing.

Unlike the /a, the position of the /b is important if a source file includes the #1A character. If the /b is at the end of the command, the file will be truncated up to the #1A (but will not include the #1A).

Any of the following will correct this behaviour:

COPY a.txt+b.txt c.txt /y /b
COPY a.txt+b.txt /b c.txt /y
COPY /b a.txt+b.txt c.txt /y

But only the following will work in cases where the DOS end of file is not used to denote the end of a file:

COPY a.txt /b + b.txt c.txt /y
COPY /b a.txt + b.txt c.txt /y

Note: To confuse things further, adding /b after a source file will apply /b to every source file after it until there is a /a.

In normal operation, this behaviour may seem at best bizarre. As DOS file systems have always recorded the file size, an End of File character should be redundant.

https://en.wikipedia.org/wiki/End-of-file

This was done for two reasons:

Backward compatibility with CP/M. The CP/M file system only recorded the lengths of files in multiples of 128-byte "records", so by convention a Control-Z character was used to mark the end of meaningful data if it ended in the middle of a record. The MS-DOS filesystem has always recorded the exact byte-length of files, so this was never necessary on MS-DOS.

It allows programs to use the same code to read input from both a terminal and a text file.

The upshot of this is that this allows one to take input from a device (e.g. a COM port), or output to a device while still being able to distinguish different files.

https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/copy

You can substitute a device name for one or more occurrences of source or destination.

like image 27
Haggunenom Avatar answered Nov 02 '22 09:11

Haggunenom


You could change the switch /a (ASCII-Text) to /b (binary)
Look also at copy /?

so the resulting command is

COPY a.txt+b.txt c.txt /y /b
like image 26
jeb Avatar answered Nov 02 '22 10:11

jeb


Change from copy to type

type a.txt>c.txt
type b.txt>>c.txt
like image 38
Endoro Avatar answered Nov 02 '22 11:11

Endoro


I thought @Andrew's sample was wrong, but it's actually more correct than mine was.

The thing is, the [/A | /B] specifier works both ways. And that's kind of confusing. copy /? shows [/A | /B] before first source file, but also after every other source and also after destination

COPY ... [/A | /B ] source [/A | /B] [+ source [/A | /B] ...] [destination [/A | /B]]

The specifier actually applies to a file before, but then also all files after it, including destination. But only until the opposite specifier is found on the command line, in which case the later specifier applies to all files after it, but also one before.

Combining copy command defaults to ASCII
Sample.

copy aa + bb + cc dd
ASCII ASCII ASCII ASCII

Specify all files to be copied as binary, next three samples have the same effect:

copy /b aa + bb + cc dd
bin bin bin bin

copy aa /b + bb + cc dd
bin bin bin bin

copy aa + /b bb + cc dd
bin bin bin bin

and some more testing:

copy aa + bb /a + cc dd
ASCII ASCII ASCII ASCII

copy aa + bb /b + cc dd
ASCII BIN BIN BIN

copy /b aa + bb + cc dd /a
bin bin bin ascii

copy aa /a + bb + cc dd /b
ASCII ASCII ASCII bin

copy aa + bb + cc dd /b
ASCII ASCII ASCII bin

copy aa + bb + cc /a dd /b
ASCII ASCII ASCII bin

But if re-using source as destination, destination type will override source type on the same file:

copy aa + bb + cc aa /b
BIN ASCII ASCII BIN

copy aa + bb + cc /b aa
BIN ASCII BIN BIN

That means my original sample was actually copying all files as binary, and /A on beginning was overridden. Now it's doing the same, but it looks better.

@Andrew sample is doing what he promised, it's just the /A is useless there.


If you want to add one file to another, you don't need the third file. just use the first again as destination. Can't be the second, or you'll be overwriting it before reading.

this is a script I use to join all text files from a list of files.

@echo off

set concatenated=final.js

pushd %~dp0

set error=
copy nul "%concatenated%"
if errorlevel 1 set error=true

for /f %%a in (filelist.txt) do (
    echo. && echo.
    echo.   *** %%a
    copy /B /V "%concatenated%" + "%%a" "%concatenated%"
    if errorlevel 1 set error=true
)

popd
echo.
if defined error (echo.   !!!!!!!!!  THERE WERE ERRORS  !!!!!!!!!!
) else echo.   ***  ALL DONE  ***
echo.
pause
exit /b 
like image 1
papo Avatar answered Nov 02 '22 09:11

papo