Is there a built in dll that will give me a list of links from a string. I want to send in a string with valid html and have it parse all the links. I seem to remember there being something built into either .net or an unmanaged library.
I found a couple open source projects that looked promising but I thought there was a built in module. If not I may have to use one of those. I just didn't want an external dependency at this point if it wasn't necessary.
I'm not aware of anything built in and from your question it's a little bit ambiguous what you're looking for exactly. Do you want the entire anchor tag, or just the URL from the href attribute?
If you have well-formed XHtml, you might be able to get away with using an XmlReader and an XPath query to find all the anchor tags (<a>
) and then hit the href attribute for the address. Since that's unlikely, you're probably better off using RegEx to pull down what you want.
Using RegEx, you could do something like:
List<Uri> findUris(string message)
{
string anchorPattern = "<a[\\s]+[^>]*?href[\\s]?=[\\s\\\"\']+(?<href>.*?)[\\\"\\']+.*?>(?<fileName>[^<]+|.*?)?<\\/a>";
MatchCollection matches = Regex.Matches(message, anchorPattern, RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Compiled);
if (matches.Count > 0)
{
List<Uri> uris = new List<Uri>();
foreach (Match m in matches)
{
string url = m.Groups["url"].Value;
Uri testUri = null;
if (Uri.TryCreate(url, UriKind.RelativeOrAbsolute, out testUri))
{
uris.Add(testUri);
}
}
return uris;
}
return null;
}
Note that I'd want to check the href to make sure that the address actually makes sense as a valid Uri. You can eliminate that if you aren't actually going to be pursuing the link anywhere.
I don't think there is a built-in library, but the Html Agility Pack is popular for what you want to do.
The way to do this with the raw .NET framework and no external dependencies would be use a regular expression to find all the 'a' tags in the string. You would need to take care of a lot of edge cases perhaps. eg href = "http://url" vs href=http://url etc.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With