I know that this question looks exactly like so many other on here, because I just read them all and they all say to do what I already tried, and it hasn't worked (or I'm missing a subtle difference with my situation). Here is my situation:
I am writing a scraper using Scrapy and Python 2.7.11, and my code looks like this (this is a copy and paste with irrelevant lines omitted, but I can re-add them upon request):
class LbcSubtopicSpider(scrapy.Spider):
...omitted...
rawTranscripts = []
rawTranslations = []
def parse(self, response):
#global rawTranscripts, rawTranslations
rawTitles = []
rawVideos = []
for sel in response.xpath('//ul[1]'): #only scrape the first list
...omitted...
index = 0
for sub in sel.xpath('li/ul/li/a'): #scrape the sublist items
index += 1
if index%2!=0: #odd numbered entries are the transcripts
transcriptLink = sub.xpath('@href').extract()
#url = response.urljoin(transcriptLink[0])
#yield scrapy.Request(url, callback=self.parse_transcript)
else: #even numbered entries are the translations
translationLink = sub.xpath('@href').extract()
url = response.urljoin(translationLink[0])
yield scrapy.Request(url, callback=self.parse_translation)
print rawTitles
print rawVideos
print rawTranslations
def parse_translation(self, response):
global rawTranslations
for sel in response.xpath('//p[not(@class)]'):
rawTranslation = sel.xpath('text()').extract()
rawTranslations.append(rawTranslation)
This will return an error any time either "print rawTranslations" or "rawTranslations.append(rawTranslation)" is called because the global "rawTranslations" is not defined.
As I said before, I have looked into this pretty extensively and pretty much everyone on the internet says to just add a "global (name)" line to the beginning of any function you'd use/modify it in (although I'm not assigning to it ever, so I shouldn't even need this). I get the same result whether or not my global lines are commented out. This behavior seems to defy everything I've read about how globals work in Python, so I suspect this might be a Scrapy quirk related to how parse functions are called through scrapy.Request(....).
Apologies for posting what appears to be the same question you've seen so much yet again, but it seems to be a bit twisted this time around and hopefully someone can get to the bottom of it. Thanks.
In your case the variable you want to access is not global, it is in the scope of the class.
global_var = "global"
class Example:
class_var = "class"
def __init__(self):
self.instance_var = "instance"
def check(self):
print(instance_var) # error
print(self.instance_var) # works
print(class_var) # error
print(self.class_var) # works, lookup goes "up" to the class
print(global_var) # works
print(self.global_var) # works not
You only need the global
keyword if you want to write to a global variable. Hint: Don't do that because global variables that are written to bring nothing but pain and despair. Only use global variables as (config) constants.
global_var = "global"
class Example:
def ex1(self):
global_var = "local" # creates a new local variable named "global_var"
def ex2(self):
global global_var
global_var = "local" # changes the global variable
Example().ex1()
print(global_var) # will still be "global"
Example().ex2()
print(global_var) # willnow be "local"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With