I essentially want to spider my local site and create a list of all the titles and URLs as in:
http://localhost/mySite/Default.aspx My Home Page http://localhost/mySite/Preferences.aspx My Preferences http://localhost/mySite/Messages.aspx Messages
I'm running Windows. I'm open to anything that works--a C# console app, PowerShell, some existing tool, etc. We can assume that the tag does exist in the document.
Note: I need to actually spider the files since the title may be set in code rather than markup.
A quick and dirty Cygwin Bash script which does the job:
#!/bin/bash
for file in $(find $WWWROOT -iname \*.aspx); do
echo -en $file '\t'
cat $file | tr '\n' ' ' | sed -i 's/.*<title>\([^<]*\)<\/title>.*/\1/'
done
Explanation: this finds every .aspx file under the root directory $WWWROOT, replaces all newlines with spaces so that there are no newlines between the <title>
and </title>
, and then grabs out the text between those tags.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With