I am trying to extract the contents with in the body tag in my html file using command prompt and findstr command. My html is as below
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>AdminWeb</title>
<base href="/wwwroot/admin-web/">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="icon" type="image/x-icon" href="favicon.ico">
</head>
<body><app-root></app-root><script src="/wwwroot/admin-web/runtime.js"></script><script src="/wwwroot/admin-web/file1.js" nomodule></script><script src="/wwwroot/admin-web/file2.js"></script><script src="/wwwroot/admin-web/styles.js"></script><script src="/wwwroot/admin-web/vendor.js"></script><script src="/wwwroot/admin-web/main.js"></script></body>
</html>
The out put I want is
<app-root></app-root>
<script src="/wwwroot/admin-web/runtime.js"></script><script src="/wwwroot/admin-web/file1.js" nomodule></script><script src="/wwwroot/admin-web/file2.js"></script><script src="/wwwroot/admin-web/styles.js"></script><script src="/wwwroot/admin-web/vendor.js"></script><script src="/wwwroot/admin-web/main.js"></script>
I am trying to achieve using regular expression.
findstr /R (?<=<body>)(.*)(?=</body>) test.html
But this is now working in command prompt. But this regex is working in js.
Thanks in advance.
First of all, findstr does not support all cool regex features that modern regex engines offer. Especially the latest JavaScript ECMAScript2018+ compliant engines like in Chrome, Node.js, etc. So, saying that "this regex is working in js" does not mean the same pattern will work anywhere else. It won't certainly work in findstr.
You may take the hard way and go on to study how to write a batch script for this. However, there is a much simpler way with other built-in Windows apps.
I strongly suggest Powershell as it offers you a lot of features .NET provides.
Here, open PowerShell console and use
$pathToFile = 'c:\...\...\you_file.txt'
$output_file = 'c:\...\...\you_file_out.txt'
$rx = '(?s)(?<=<body>).*?(?=</body>)'
Get-Content $pathToFile -Raw | Select-String $rx -AllMatches | % { $_.Matches } | % { $_.Value } > $output_file
NOTE: It is best to use an IE automation to handle HTML.
$output_file = 'c:\...\...\you_file_out.txt'
$url = 'http://your_site_here.tld/...'
$ie = New-Object -comobject "InternetExplorer.Application"
$ie.visible = $true
$ie.navigate($url)
while ($ie.Busy -eq $true -Or $ie.ReadyState -ne 4) {Start-Sleep 2}
$doc = $ie.Document
$tags = $doc.getElementsByTagName("body")
$tags[0].innerHTML > $output_file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With