I'm trying to load Captcha's faster then rendering them in a WebBrowser Control then copy/pasting the image and rendering it into a picturebox.
Why not just download the picture right into the PictureBox right away which has the advantage of using less CPU Usage and memory pretty much this solution works for any other captcha service which is more advanced called Solve Media (with Solve Media if you view the image url the next time you try to view it, it gives you a fake error catpcha image).
But now I need support for ReCaptcha Captcha system as well for the use of automating my bot at a faster pace then just refreshing a webpage and waiting for it to render.
So I'll just write my code here as far as I understand I'm just missing emulating one of the properties in HTML Request I got the User-Agent faked as a real Internet Explorer 8, I think the problem is the Cookie seems somehow it generates a cookie I can't figure out where, but I also get one Cookie I think from downloading the Javascript file.
Either way Google ReCaptcha tries to trick you with a fake Captcha which you cannot read to rub it in your face that you are not doing something right. I figured out when you see the 2 Black circles then its obvious it's fake.
Here is a example of Bad Captcha and Good Captcha
At one point I remember ReCaptcha had another security feature which somehow knew if you loaded the Captcha image from the actual domain where it's placed I don't know how that worked since I download everything locally right? but they seem to have removed that feature anyways. (Actually it exists on some websites seems to be disabled by default, easy to trick it uses Referer header)
I'm not trying to cheat anything here I will still be typing in these Captcha's manually by hand but I want to type them in faster then required rendering the page normally is.
I want the Captcha's to become either those street numbers.. or at least 2 words without those black circles.
Anyways here is my Current Code.
Dim newCaptcha = New Captcha
Dim myUserAgent As String = ""
Dim myReferer As String = "http://www.google.com/recaptcha/demo/"
Dim outputSite As String = HTTP.HTTPGET("http://www.google.com/recaptcha/demo/", "", "", "", myUserAgent, myReferer)
Dim recaptchaChallengeKey = GetBetween(outputSite, "http://www.google.com/recaptcha/api/challenge?k=", """")
'Google ReCaptcha Captcha
outputSite = HTTP.HTTPGET("http://www.google.com/recaptcha/api/challenge?k=" & recaptchaChallengeKey, "", "", "", myUserAgent, myReferer)
'outputSite = outputSite.Replace("var RecaptchaState = {", "{""RecaptchaState"": {")
'outputSite = outputSite.Replace("};", "}}")
'Dim jsonDictionary As Dictionary(Of String, Object) = New JavaScriptSerializer().Deserialize(Of Dictionary(Of String, Object))(outputSite)
Dim recaptchaChallenge = GetBetween(outputSite, "challenge : '", "',")
outputSite = HTTP.HTTPGET("http://www.google.com/recaptcha/api/js/recaptcha.js", "", "", "", myUserAgent, myReferer) 'This page looks useless but it seems the javascript loads this anyways, maybe this why I get bad captchas?
If HTTP.LoadWebImageToPictureBox(newCaptcha.picCaptcha, "http://www.google.com/recaptcha/api/image?c=" & recaptchaChallenge, myUserAgent, myReferer) = False Then
MessageBox.Show("Recaptcha Image loading failed!")
Else
Dim newWork As New Work
newWork.CaptchaForm = newCaptcha
newWork.AccountId = 1234 'ID of Accounts.
newWork.CaptchaHash = "recaptcha_challenge_field=" & recaptchaChallenge
newWork.CaptchaType = "ReCaptcha"
Works.Add(newWork)
newCaptcha.Show()
End If
Here is the HTTP class I use.
Imports System.Collections.Generic
Imports System.Linq
Imports System.Text
Imports System.Net
Imports System.IO
Public Class HTTP
Public StoredCookies As New CookieContainer
Public Function HTTPGET(ByVal url As String, ByVal proxyname As String, ByVal proxylogin As String, ByVal proxypassword As String, ByVal userAgent As String, ByVal referer As String) As String
Dim resp As HttpWebResponse
Dim req As HttpWebRequest = DirectCast(WebRequest.Create(url), HttpWebRequest)
If userAgent = "" Then
userAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
End If
req.UserAgent = userAgent
req.Referer = referer
req.AllowAutoRedirect = True
req.ReadWriteTimeout = 5000
req.CookieContainer = StoredCookies
req.Headers.Set("Accept-Language", "en-us")
req.KeepAlive = True
req.Method = "GET"
Dim stream_in As StreamReader
If proxyname <> "" Then
Dim proxyIP As String = proxyname.Split(New Char() {":"})(0)
Dim proxyPORT As Integer = CInt(proxyname.Split(New Char() {":"})(1))
Dim proxy As New WebProxy(proxyIP, proxyPORT)
'if proxylogin is an empty string then don't use proxy credentials (open proxy)
If proxylogin <> "" Then
proxy.Credentials = New NetworkCredential(proxylogin, proxypassword)
End If
req.Proxy = proxy
End If
Dim response As String = ""
Try
resp = DirectCast(req.GetResponse(), HttpWebResponse)
StoredCookies.Add(resp.Cookies)
stream_in = New StreamReader(resp.GetResponseStream())
response = stream_in.ReadToEnd()
stream_in.Close()
Catch ex As Exception
End Try
Return response
End Function
Public Function LoadWebImageToPictureBox(ByVal pb As PictureBox, ByVal ImageURL As String, ByVal userAgent As String, ByVal referer As String) As Boolean
Dim bAns As Boolean
Try
Dim resp As WebResponse
Dim req As HttpWebRequest
Dim sURL As String = Trim(ImageURL)
If Not sURL.ToLower().StartsWith("http://") Then sURL = "http://" & sURL
req = DirectCast(WebRequest.Create(sURL), HttpWebRequest)
If userAgent = "" Then
userAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET4.0C; .NET4.0E; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
End If
req.UserAgent = userAgent
req.Referer = referer
req.AllowAutoRedirect = True
req.ReadWriteTimeout = 5000
req.CookieContainer = StoredCookies
req.Headers.Set("Accept-Language", "en-us")
req.KeepAlive = True
req.Method = "GET"
resp = req.GetResponse()
If Not resp Is Nothing Then
Dim remoteStream As Stream = resp.GetResponseStream()
Dim objImage As New MemoryStream
Dim bytesProcessed As Integer = 0
Dim myBuffer As Byte()
ReDim myBuffer(1024)
Dim bytesRead As Integer
bytesRead = remoteStream.Read(myBuffer, 0, 1024)
Do While (bytesRead > 0)
objImage.Write(myBuffer, 0, bytesRead)
bytesProcessed += bytesRead
bytesRead = remoteStream.Read(myBuffer, 0, 1024)
Loop
pb.Image = Image.FromStream(objImage)
bAns = True
objImage.Close()
End If
Catch ex As Exception
bAns = False
End Try
Return bAns
End Function
End Class
EDIT: I figured out the problem it's this Google Javascript Clientside Obfuscated Encryption system at
http://www.google.com/js/th/1lOyLe_nzkTfeM2GpTkE65M1Lr8y0MC8hybXoEd-x1s.js
I still want to be able to defeat it without using a heavy webbrowser maybe some lightweight fast javascript evaluate control? No point in unobfuscating it and porting it over to VB.NET because as soon as I do it they might change a few variables or the encryption completely and I did all that work for nothing, so I want something that's more intelligent. At this point I don't even know how the URL is generated it does seem static for now and it's probably a real file not just in time generated file.
Turns out the _challenge
page which gives the challenge for the image is just a decoy challenge.. that challenge then gets replaced (encrypted perhaps?) client-sided using variables t1, t2, t3, seems this encryption is not used each time, if you pass it once you get access to do what I am trying to do pretty much my code works but it stops working at very random intervals, I want something more solid which I can leave unattended for weeks.
I had the same problem and found a solution, which will not give the easiest captchas but at least images which are a lot more easier. The result will be one readable word and one obscured.
I found that downloading "recaptcha/api/reload" is important to achieve that. Also maybe it makes a difference to add the "cachestop" paramater and maybe the referer.
data = UrlMgr("http://www.google.com/recaptcha/api/challenge?k=%s&cachestop=%.17f" % (id, random.random()), referer=referer, nocache=True).data
challenge = re.search("challenge : '(.*?)',", data).group(1)
server = re.search("server : '(.*?)',", data).group(1)
# this step is super important to get readable captchas - normally we could take the "c" from above and already retrieve a captcha but
# this one would be barely readable
reloadParams["c"] = challenge
reloadParams["k"] = id
reloadParams["lang"] = "de"
reloadParams["reason"] = "i"
reloadParams["type"] = "image"
data = UrlMgr("http://www.google.com/recaptcha/api/reload" , params=reloadParams, referer=referer, nocache=True).data
challenge = textextract(data, "Recaptcha.finish_reload('", "',")
return challenge, solveCaptcha(UrlMgr("%simage" % (server), params={"c":challenge}, referer=referer))
For further improvments my guess is that the "th" parameter is used to detect bots. It is generated by some complicated javascript which i myself didn't debug.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With