Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace relative urls to absolute

I have the html source of a page in a form of string with me:

<html>
    <head>
          <link rel="stylesheet" type="text/css" href="/css/all.css" /> 
    </head>
    <body>
        <a href="/test.aspx">Test</a>
        <a href="http://mysite.com">Test</a>
        <img src="/images/test.jpg"/>
        <img src="http://mysite.com/images/test.jpg"/>
    </body>
</html>

I want to convert all the relative paths to absolute. I want the output be:

<html>
    <head>
          <link rel="stylesheet" type="text/css" href="http://mysite.com/css/all.css" /> 
    </head>
    <body>
        <a href="http://mysite.com/test.aspx">Test</a>
        <a href="http://mysite.com">Test</a>
        <img src="http://mysite.com/images/test.jpg"/>
        <img src="http://mysite.com/images/test.jpg"/>
    </body>
</html>

Note: I want only the relative paths to be converted to absolute ones in that string. The absolute ones which are already in that string should not be touched, they are fine to me as they are already absolute. Can this be done by regex or other means?

like image 447
Rocky Singh Avatar asked Mar 18 '26 14:03

Rocky Singh


2 Answers

Don't try to parse html with regex as expained here https://stackoverflow.com/a/1732454/932418 and https://stackoverflow.com/a/1758162/932418

Use an html parser like HtmlAgilityPack instead

string html = 
@"<html>
    <head>
            <link rel=""stylesheet"" type=""text/css"" href=""/css/all.css"" /> 
    </head>
    <body>
        <a href=""/test.aspx"">Test</a>
        <a href=""http://example.com"">Test</a>
        <img src=""/images/test.jpg""/>
        <img src=""http://example.com/images/test.jpg""/>
    </body>
</html>";

StringWriter writer = new StringWriter();
string baseUrl= "http://example.com";
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

foreach(var img in doc.DocumentNode.Descendants("img"))
{
    img.Attributes["src"].Value = new Uri(new Uri(baseUrl), img.Attributes["src"].Value).AbsoluteUri;
}

foreach (var a in doc.DocumentNode.Descendants("a"))
{
    a.Attributes["href"].Value = new Uri(new Uri(baseUrl), a.Attributes["href"].Value).AbsoluteUri;
}

doc.Save(writer);

string newHtml = writer.ToString();
like image 102
L.B Avatar answered Mar 20 '26 03:03

L.B


Add

<base href="http://mysite.com/images/" />

To the head of the page

like image 30
mplungjan Avatar answered Mar 20 '26 04:03

mplungjan