Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using python to visit a link and print data [closed]

I'm writing a web scraper and trying to get back Drake lyrics. My scraper has to visit one site (main metrolyrics site) and then visit each individual song link, then print out the lyrics.

I'm having trouble visiting the second link. I've searched around on BeautifulSoup and am pretty confused. I'm wondering if you can help.

# this is intended to print all of the drake song lyrics on metrolyrics

from pyquery import PyQuery as pq
from lxml import etree
import requests
from bs4 import BeautifulSoup

# this visits the website
response = requests.get('http://www.metrolyrics.com/drake-lyrics.html')

# this separates the different types of content
doc = pq(response.content)

# this finds the titles in the content
titles = doc('.title')

# this visits each title, then prints each verse
for title in titles:
    # this visits each title
  response_title = requests.get(title)
    # this separates the content
  doc2 = pq(response_title.content)
    # this finds the song lyrics
  verse = doc2('.verse')
    # this prints the song lyrics
  print verse.text

In response_title = requests.get(title), python isn't recognizing that title is a link, which makes sense. How do I get the actual in there, though? Appreciate your help.

like image 298
Margot Mazur Avatar asked Nov 19 '25 22:11

Margot Mazur


1 Answers

Replace

response_title = requests.get(title)

with

response_title = requests.get(title.attrib['href'])

Full working script (with fixed note from the comment below)

#!/usr/bin/python

from pyquery import PyQuery as pq
from lxml import etree
import requests
from bs4 import BeautifulSoup

# this visits the website
response = requests.get('http://www.metrolyrics.com/drake-lyrics.html')

# this separates the different types of content
doc = pq(response.content)

# this finds the titles in the content
titles = doc('.title')

# this visits each title, then prints each verse
for title in titles:
    # this visits each title
  #response_title = requests.get(title)
  response_title = requests.get(title.attrib['href'])

    # this separates the content
  doc2 = pq(response_title.content)
    # this finds the song lyrics
  verse = doc2('.verse')
    # this prints the song lyrics
  print verse.text()
like image 128
Alex Paramonov Avatar answered Nov 22 '25 10:11

Alex Paramonov



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!