Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quickest way to get list of <title> values from all pages on localhost website

I essentially want to spider my local site and create a list of all the titles and URLs as in:

http://localhost/mySite/Default.aspx      My Home Page
http://localhost/mySite/Preferences.aspx  My Preferences
http://localhost/mySite/Messages.aspx     Messages

I'm running Windows. I'm open to anything that works--a C# console app, PowerShell, some existing tool, etc. We can assume that the tag does exist in the document.

Note: I need to actually spider the files since the title may be set in code rather than markup.

like image 918
Larsenal Avatar asked Mar 01 '23 03:03

Larsenal


1 Answers

A quick and dirty Cygwin Bash script which does the job:

#!/bin/bash
for file in $(find $WWWROOT -iname \*.aspx); do
  echo -en $file '\t'
  cat $file | tr '\n' ' ' | sed -i 's/.*<title>\([^<]*\)<\/title>.*/\1/'
done

Explanation: this finds every .aspx file under the root directory $WWWROOT, replaces all newlines with spaces so that there are no newlines between the <title> and </title>, and then grabs out the text between those tags.

like image 161
Adam Rosenfield Avatar answered Apr 28 '23 03:04

Adam Rosenfield