I am able to retrieve the text before the
tag but not the text after it.
This is the website that I am trying to scrape the comments from: http://hamusoku.com/archives/9589071.html#comments
Starting from some comments include the
tag which I think means that the user hit enter. Is there a way to get the text before and after the
tag as a single comment?
Here is a sample of the source code
<li="comment-body"> ==$0
"
愛の言葉も、この瞬間は辛い。"
<br>
"
胸が締め付けられそうだ。"
This is my code:
import scrapy
class HamusoSpider(scrapy.Spider):
name = 'hamuso'
start_urls = ['http://hamusoku.com/archives/9589071.html#comments/']
def parse(self, response):
for com in response.css('li.comment-body'):
item = {
'comment': com.css('li::text').extract_first()
}
yield item
This is the output that I am getting in the shell:
{'comment': '\n\t\n\tかなしいなぁ'}
{'comment': '\n\t\n\t海老蔵…つらいな'}
{'comment': '\n\t\n\t海老蔵には頑張って欲しいな'}
{'comment': '\n\t\n\t御冥福をお祈りします'}
{'comment': '\n\t\n\t泣かすなや。'}
{'comment': '\n\t\n\t海老蔵これからしっかりせなアカンぞ'}
{'comment': '\n\t\n\t愛の言葉も、この瞬間は辛い。'}
{'comment': '\n\t\n\tただただ涙が止まらない会見だった'}
The last two comments both have a
tag and in both cases the second part of the comment is omitted.
I would really really appreciate any help with this.
I have ran your spider and realised that when you extraxt_first(), you only get the first item or first comment the rest, which are after the <br> tags are unreacheable.
To solve this, use extract() this will return a list of all the comments in the comment-body
import scrapy
class HamusoSpider(scrapy.Spider):
name = 'hamuso'
start_urls = ['http://hamusoku.com/archives/9589071.html#comments/']
def parse(self, response):
for com in response.css('li.comment-body'):
item = {'comment': com.css('li::text').extract()}
yield item
the output I get for the last comment on your output is
{'comment': ['\n\t\n\tただただ涙が止まらない会見だった', '\n本当に短い人生だったけど豊かな人生だったのがわかる']}
{'comment': ['\n\t\n\t愛の言葉も、この瞬間は辛い。', '\n胸が締め付けられそうだ。']}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With