Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using a web-proxy service to get the html content of the target url?

In C# or else VB.Net, I need to access to a webpage through a web-proxy service to do a web-scraping on the target url which I am interested to.

Let's give as example a random web-proxy service (really no matter which one, I'm open to suggestions) for example this below, which does not complicate things like others do with hashes in the query (that's a thing that I don't know how to handle):

http://proxyanonimo.es/browse.php?u=http%3a%2f%2furl.com

Then, when i perform an HttpWebRequest to that url I expected to encounter in the response the target url's html content, but instead of that I get this content:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
 
<html>
<head>
<title>Proxy Anonimo :: Spanish Web Proxy</title>
<meta name="keywords" content="proxy, webproxy, proxy online, spanish proxy" />
<meta name="description" content="Usa nuestro WebProxy An&#65533;nimo para comprobar como se ve una web desde otro sitio que no sea el ordenador en el que est&#65533;s sentado. Es un acceso remoto desde nuestro servidor." />
 
<style type="text/css">
    html, body {
       text-align: center;
    }
    #wrapper {
       width: 740px;
       margin: 0 auto 0 auto;
       text-align: left;
       padding: 10px;
       background: #eee;
       border: 4px outset #ccc;
    }
    #footer {
       margin: 10px 0 0 0; 
       font-size: 80%;
       color: #ccc;
    }
    #error {
       border: 1px solid red;
       padding: 2px;
       margin: 5px 0 15px 0;
       background: #eee;
    }
    .center { text-align: center; }
 
    /* TOOLTIP HOVER EFFECT */
    #tooltip{ 
       width:20em; background: #fff;
    }
</style>
    <script type="text/javascript">ginf={url:'http://proxyanonimo.es',script:'browse.php',target:{h:'http://myurl.com',p:'/',b:'',u:'http://myurl.com'},enc:{u:'iawpK1Q337kKRtEraNzZubjsx46C64Qd4aqEZ6vR2GrHZTZXxmNPoU7JM4aGYQJROYjBUFiKbxiYh5LEhmjt4g3G83dVHKClyLMhgTRfgX1nSBPYLYhG38a11bMwMcF8',e:'',x:'',p:''},b:'12'}</script>
    <script type="text/javascript" src="http://proxyanonimo.es/includes/main.js?1.4.1"></script></head>
<body>
<div id="wrapper">
 
    <h1 class="center"><a href="index.php">Proxy Anonimo</a></h1>
    <h2 class="center">IPv6 Ready!</h2> 
    <div id="error">Hotlinking directly to proxied pages is not permitted.</div><p style="text-align:right">[<a href="http://proxyanonimo.es/browse.php?u=http%3a%2f%2fmyurl.com&amp;b=12&amp;f=norefer">Reload http://myurl.com</a>]</p>
 
    <h2>Proxy</h2>
 
       Usa nuestro WebProxy An&#65533;nimo para comprobar como se ve una web desde otro sitio que no sea el ordenador en el que est&#65533;s sentado. Es un acceso remoto desde nuestro servidor. Si tu conexi&#65533;n tiene alguna restricci&#65533;n, con nuestro Proxy An&#65533;nimo no tendr&#65533;as que tener problema o por lo menos, asegurarte de si la web es accesible o no. 
 
    <h2>URL</h2>
 
    <form action="includes/process.php?action=update" method="post" onsubmit="return updateLocation(this);">
        <input type="text" name="u" id="input" size="60">
 
 
 
        <!--<input type="submit" value="Go">-->
 
        <h3>Options</h3>
        <ul id="options">
            <li><input type="checkbox" name="encodeURL" id="encodeURL"><label for="encodeURL" class="tooltip" onmouseover="tooltip('Encrypts the URL of the page you are viewing so that it does not contain the target site in plaintext.')" onmouseout="exit();">Encrypt URL</label></li><li><input type="checkbox" name="encodePage" id="encodePage"><label for="encodePage" class="tooltip" onmouseover="tooltip('Helps avoid filters by encrypting the page before sending it and decrypting it with javascript once received.')" onmouseout="exit();">Encrypt Page</label></li><li><input type="checkbox" name="allowCookies" id="allowCookies" checked="checked"><label for="allowCookies" class="tooltip" onmouseover="tooltip('Cookies may be required on interactive websites (especially where you need to log in) but advertisers also use cookies to track your browsing habits.')" onmouseout="exit();">Allow Cookies</label></li><li><input type="checkbox" name="tempCookies" id="tempCookies" checked="checked"><label for="tempCookies" class="tooltip" onmouseover="tooltip('This option overrides the expiry date for all cookies and sets it to at the end of the session only - all cookies will be deleted when you shut your browser. (Recommended)')" onmouseout="exit();">Force Temporary Cookies</label></li><li><input type="checkbox" name="stripTitle" id="stripTitle"><label for="stripTitle" class="tooltip" onmouseover="tooltip('Removes titles from proxied pages.')" onmouseout="exit();">Remove Page Titles</label></li><li><input type="checkbox" name="stripJS" id="stripJS"><label for="stripJS" class="tooltip" onmouseover="tooltip('Remove scripts to protect your anonymity and speed up page loads. However, not all sites will provide an HTML-only alternative. (Recommended)')" onmouseout="exit();">Remove Scripts</label></li><li><input type="checkbox" name="stripObjects" id="stripObjects"><label for="stripObjects" class="tooltip" onmouseover="tooltip('You can increase page load times by removing unnecessary Flash, Java and other objects. If not removed, these may also compromise your anonymity.')" onmouseout="exit();">Remove Objects</label></li>      </ul>
    </form>
 
    <br>
 
    <br><br><br>
 
    <p><a href="http://s07.flagcounter.com/more/xu5M"><img src="http://s07.flagcounter.com/count/xu5M/bg=FFFFFF/txt=000000/border=CCCCCC/columns=8/maxflags=248/viewers=De+donde+nos+visitan/labels=1/pageviews=1/" alt="free counters" border="0"></a></p>
 
 
    <div id="eXTReMe"><a href="http://extremetracking.com/open?login=proxyes">
<img src="http://t1.extreme-dm.com/i.gif" style="border: 0;"
height="38" width="41" id="EXim" alt="eXTReMe Tracker" /></a>
<script type="text/javascript"><!--
EXref="";top.document.referrer?EXref=top.document.referrer:EXref=document.referrer;//-->
</script><script type="text/javascript"><!--
var EXlogin='proxyes' // Login
var EXvsrv='s10' // VServer
EXs=screen;EXw=EXs.width;navigator.appName!="Netscape"?
EXb=EXs.colorDepth:EXb=EXs.pixelDepth;EXsrc="src";
navigator.javaEnabled()==1?EXjv="y":EXjv="n";
EXd=document;EXw?"":EXw="na";EXb?"":EXb="na";
EXref?EXref=EXref:EXref=EXd.referrer;
EXd.write("<img "+EXsrc+"=http://e1.extreme-dm.com",
"/"+EXvsrv+".g?login="+EXlogin+"&amp;",
"jv="+EXjv+"&amp;j=y&amp;srw="+EXw+"&amp;srb="+EXb+"&amp;",
"l="+escape(EXref)+" height=1 width=1>");//-->
</script><noscript><div id="neXTReMe"><img height="1" width="1" alt=""
src="http://e1.extreme-dm.com/s10.g?login=proxyes&amp;j=n&amp;jv=n" />
</div></noscript></div>
 
<p class="center">Powered by <a href="http://www.glype.com/">Glype</a>&reg; v1.4.1.</p> 
</div>
 
<script type="text/javascript">
var infolinks_pid = 1993344;
var infolinks_wsid = 0;
</script>
<script type="text/javascript" src="http://resources.infolinks.com/js/infolinks_main.js"></script>
 
</body>
</html>

Then... this is possibly to do?.

What I'm missing?.

Maybe the web-proxy service that I'm trying is resctricting me something?, maybe another web-proxy service could help me better for my needs?.

like image 887
ElektroStudios Avatar asked Jul 23 '15 13:07

ElektroStudios


People also ask

What is proxy in HTML?

The Proxy object allows you to create an object that can be used in place of the original object, but which may redefine fundamental Object operations like getting, setting, and defining properties. Proxy objects are commonly used to log property accesses, validate, format, or sanitize inputs, and so on.

What is a Web Proxy?

A proxy server is a web server that acts as a gateway between a client application, for example, a browser, and the real server. It makes requests to the real server on behalf of the client or sometimes fulfills the claim itself.

What is an HTTP proxy and how does it work?

The HTTP-proxy is a high-performance content filter. It examines Web traffic to identify suspicious content that can be a virus or other type of intrusion. It can also protect your HTTP server from attacks. WatchGuard recommends you use HTTP Proxy policies for any HTTP traffic between your network and external hosts.

What is Web proxy in C#?

Web Proxy Server is HTTP proxy server written in C#.It is Multithreaded so many clients can access the web through this WebProxy Server. WebProxy_Service.zip|WebProxyConsole.zip. Introduction. Web Proxy Server is HTTP proxy server written in C#.


1 Answers

I would like to suggest you use direct proxy IP:port, for example 115.238.225.26:80. Then you could easy handle problem using next code:

HttpWebRequest req = (HttpWebRequest) WebRequest.Create(new Uri("http://example.com"));
WebProxy webproxy = new WebProxy("115.238.225.26", 80);
webproxy.BypassProxyOnLocal = false;
req.Method = "GET";
req.Proxy = webproxy;
HttpWebResponse response = (HttpWebResponse) req.GetResponse();
var respStream = response.GetResponseStream();
var result = "";
if (respStream != null) {
    var strReader = new StreamReader(respStream);
    result = strReader.ReadToEnd();
}

Then in result variable you will find result page content or empty string in case some problems occurs(respStream==null). Additionally it may be required add exceptions handling for this code in case any connection problems occurs or so.

like image 96
Volodymyr Avatar answered Sep 19 '22 20:09

Volodymyr