cat file.txt
[...]
<td><script> document.write('89.223.92.30')</script></td>
<td><script> document.write('9027')</script></td>
<td><script> document.write('185.204.3.105')</script></td>
<td><script> document.write('1081')</script></td>
<td><script> document.write('91.238.137.108')</script></td>
<td><script> document.write('8975')</script></td>
[...]
I want to grab ip and port; here is what I do:
egrep -oP '([0-9]{1,3}\.){3}[0-9]{1,3} | [0-9]{2,5}' file.txt
but it does not work (the second pattern matches with the first one among other pbs).
it only works individually.
egrep -oP '([0-9]{1,3}\.){3}[0-9]{1,3}' file.txt
grep -oP "'[0-9]{2,5}'" file.txt
-> works but cant get rid of ' at the begining and at the end; if I remove them in this model, it matches with the ips as well, which is what I dont want.
I also tried :
sed 's/ \<td\>\<script\> document\.write\(\'//g' file.txt | sed 's/\'\)\<\/script\>\<\/td\>'//g'
the idea here is to trim all garbage before and after ip and port.
result needed:
ip0 port0 (I will store the results in a array that will be used for ssh connection later on).
ip1 port1
ip2 port2 ...
You could try something like this:
$ cat ipport.txt | sed 's/.*write('"'"'//g' | sed 's/'"'"').*//g' | while read -r ip && read -r port; do echo "$ip $port"; done
89.223.92.30 9027
185.204.3.105 1081
91.238.137.108 8975
Note, however, that this is generally super error-prone. If your order of ip-port lines will swap somewhere in the sequence, it will all break.
Generally for parsing HTML files you could use some other language, more suited for this, like python and BeautifulSoup library
Simpler version, without single quote escaping:
cat ipport.txt | sed "s/.*write('//g" | sed "s/').*//g" | while read -r ip && read -r port; do echo "$ip $port"; done
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With