Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

printing 'response' from scrapy request

i am trying to learn scrapy and while following a tutorial, i am trying to make minor adjustments.

I want to simply get the response content from a request. i will then pass the response into the tutorial code but I am unable to make a request and get the content of response. advise would nice

from scrapy.http import Response

url = "https://www.myUrl.com"
response = Response(url=url)
print response # <200 myurl.com> 

# but i want the content! and I cant find the method
like image 828
Bill Hanery Avatar asked Feb 15 '17 04:02

Bill Hanery


3 Answers

Scrapy is a bit of complicated framework. You can't just create a requests and responses in the way you want to here.
Scrapy is split into several parts, like Downloader part which downloads requests schedules in Scheduler part - in short you'd need to start all those parts as well in your code to simply get a request like that.

You can see illustration and description of whole complex architecture here

enter image description here

What you can do though is simply use scrapy shell command which downloads url content and lets you interact with it:

$ scrapy shell "http://stackoverflow.com"
....
[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x7f14d9fef5f8>
[s]   item       {}
[s]   request    <GET http://stackoverflow.com>
[s]   response   <200 http://stackoverflow.com>
[s]   settings   <scrapy.settings.Settings object at 0x7f14d8d0f9e8>
[s]   spider     <DefaultSpider 'default' at 0x7f14d8af4f28>
[s] Useful shortcuts:
[s]   fetch(url[, redirect=True]) Fetch URL and update local objects (by default, redirects are followed)
[s]   fetch(req)                  Fetch a scrapy.Request and update local objects 
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser
In [1]: len(response.body)
Out[1]: 244649

Another alternative is just write a spider and inject inspect_response() into your parse function.

import scrapy 
from scrapy.shell import inspect_response

class MySpider(scrapy.Spider):
    name = 'myspider'
    start_urls = ['http://stackoverflow.com',]

    def parse(self, response):
        inspect_response(response, self)
        # shell will open up here just like from the first example
like image 63
Granitosaurus Avatar answered Oct 20 '22 22:10

Granitosaurus


If you just want to print all the content:

print response.text
like image 40
Sogeking Avatar answered Oct 20 '22 22:10

Sogeking


I Agree with the things that granito.. to point out, saying Scrapy can be summed up is still a tough bet even if just covering just the framework itself... You will only understand better as you go through your tutorials! Look the best Learning Resource you have is you logic, dedication and Google. By your code snippet there I can tell your coming from using some bs4 which is great! you can use in in a scrapy spider... I can tell that you really just started learning... like recently, not defining a class of spider or naming it and dude! nothing wrong with that!

As far as to your question about getting the content, again it's going over any scrapy tutorial written... datamining/scrapy is 99.9% just this, selcting what date HOW?

USING the pages CSS elemnts in your spider of which you define an item to >it, wit using the pages response or your mutated (your new changed >version off) you can the export it out as a yield or return function.. printing is usualy done more so for log puposes this elmente might be a link, just text... a file??

Using xpath in the same fashion as css but their structured different

using regular expression will become an almost certain must, but lets take baby steps.

... the entirety of data-mining IS to extract your content, I feel as if Id be robbing you of your own moment so tell you wham. Do the tutorial from from the official scrapy docs , referred to as the quotes tutorial... and if you have ANY still question how what happened in that tutorial Ill share my class course (of which I get paid for... but no for u for free..) on this inro step... but man... its basically little knowledge of css.. how to use a web browser inspect tools, or old school it and just view source.. I really wish I could help, my nerd sense are tingling but I can take you moment of epiphany... bet some will... but you gain nothing right?

PS:

as to your first question to get the content.... like, all? the entire html? the body, just all links, or just the links that contain X.. Lets say were talking about a simple blog page... has the article title date, links images inside. This im sure you know, just when you say page content you are refereing to the entirty of the page. The data that you mine will only be as valuable as the format you can the express it as and more importantly use the data against other to create an analysis... a conclusion based on data lol if you want just the entire html source the like our friend Granito-Whachamacallhim.... response.body

like image 22
scriptso Avatar answered Oct 20 '22 21:10

scriptso