Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Windows batch script to recursively extract specific files using 7zip

I have a Directory with multiple .tar files which have multiple .zip files. The tree structure is somewhat of this sort:

testDirectory
    -tarArchive.tar
        -directory1
            -zipArchive11.zip
                abc.XML
                1.TIF
                2.TIF
                ...
            -zipArchive12.zip
                xyz.XML
                a.TIF
                b.CDX
                ...
            .
            .
            .
        -directory2
            -zipArchive21.zip
                ...
            -zipArchive22.zip
                ...
            .
            .
            .
        .
        .

I need a single batch script to recursively extract only .XML from each .zip while maintaining the tree structure(first all .zip will be extracted from main .tar and then only .XML from each .zip). Additionally, the processed archives should be deleted afterwards.

I am able to achieve most of it with this code

for /R "C:\Users\frozenfyr\Desktop\test" %%I in ("*.zip", "*.tar") do (
  "C:\Program Files\7-Zip\7z.exe" x -y -o"%%~dpI" "%%~fI" && del "%%~fI"
)

except 2 things:

  1. I'm unable to extract .XML files only.

    "C:\Program Files\7-Zip\7z.exe" x -y -o"%%~dpI" "%%~fI""*.zip" -r will not extract XML files and "C:\Program Files\7-Zip\7z.exe" x -y -o"%%~dpI" "%%~fI""*.XML" -r doesn't(shouldn't) work because nothing will match in .tar file. Is there a way to do something like "C:\Program Files\7-Zip\7z.exe" x -y -o"%%~dpI" "%%~fI""*.zip" "*.XML" -r

  2. The main .tar file is not deleted after processing. I tried this script on .zip files and it works(they are deleted after processing).

I already tried a powershell script but I'm not very satisfied with it. I feel batch script is easy for me to handle. I even found a post on SU but it doesn't speak of multiple specific formats/files. I read Command Line Version User's Guide and considered -x switch but it's not useful.

I can't quote all the references here but I did go through a lot of SO and SU posts and Google was probably not my friend this time. I'm not sure whether SO or SU should be the place to ask this question, I found good references on both communities. I use SO more often so I'm here.

Please help me, this is driving me crazy..

like image 547
Fr0zenFyr Avatar asked Sep 30 '22 12:09

Fr0zenFyr


1 Answers

I have been able to solve my 1st problem. The second one looks like a little complicated. The del command works on archives of smaller size. Because my .tar is huge(>1GB), I think the next loop starts to execute before the command could delete the archive. It would be good for me if not anyone else to have an answer here as a guide for future.

Anyway, for the first problem, I found out that I will need to do recursive with switch -x i.e., -xr and because I had multiple extensions that I wanted to skip, I had to create ignore.txt and add all the exceptions there. My complete script looked like this:

for /R "C:\Users\frozenfyr\Desktop\test" %%I in ("*.zip", "*.tar") do (
  "C:\Program Files\7-Zip\7z.exe" x -y -xr@"C:\Users\frozenfyr\Desktop\ignore.txt" -o"%%~dpI" "%%~fI" && del "%%~fI"
)

If i had just one extension that I wanted to ignore, I'd do -xr!"*.extension" instead of -xr@"C:\Users\frozenfyr\Desktop\ignore.txt"

By the way, -xr does not stop 7zip from processing the excluded files, it probably deletes them after extracting because I saw extracting abc.TIF on command prompt.

like image 182
Fr0zenFyr Avatar answered Oct 03 '22 07:10

Fr0zenFyr