Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search Web Content with C#

Tags:

c#

How do you search a websites source code with C#? hard to explain, heres the source for doing it in python

import urllib2, re
word = "How to ask"
source = urllib2.urlopen("http://stackoverflow.com").read()
if re.search(word,source):
     print "Found it "+word
like image 407
localhost Avatar asked Dec 31 '22 05:12

localhost


2 Answers

If you want to access the raw HTML from a web page you need to do the following:

  1. Use a HttpWebRequest to connect to the file
  2. Open the connection and read the response stream into a string
  3. Search the response for your content

So code something like:

string pageContent = null;
HttpWebRequest myReq = (HttpWebRequest)WebRequest.Create("http://example.com/page.html");
HttpWebResponse myres = (HttpWebResponse)myReq.GetResponse();

using (StreamReader sr = new StreamReader(myres.GetResponseStream()))
{
    pageContent = sr.ReadToEnd();
}

if (pageContent.Contains("YourSearchWord"))
{
    //Found It
}
like image 64
Wolfwyrd Avatar answered Jan 12 '23 01:01

Wolfwyrd


I guess this is as close as you'll get in C# to your python code.

using System;
using System.Net;

class Program
{
    static void Main()
    {
        string word = "How to ask";
        string source = (new WebClient()).DownloadString("http://stackoverflow.com/");
        if(source.Contains(word))
            Console.WriteLine("Found it " + word);
    }
}

I'm not sure if re.search(#, #) is case sensitive or not. If it's not you could use...

if(source.IndexOf(word, StringComparison.InvariantCultureIgnoreCase) > -1)

instead.

like image 33
JohannesH Avatar answered Jan 12 '23 01:01

JohannesH