Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does one transition from using a list of functions to using classes in Python?

Let me start off by stating that I am completely self-taught in programming with python throught trial and error and a lot of googling, so please excuse my ignorance of propper programming terminology.

That being said, lets pretend I'm writing code that scrapes a website and returns a few pieces of information. In the very early stages of testing I would have the code writen "line-by-line", as in, outside of any function:

from bs4 import BeautifulSoup
import requests
url = "https://en.wikipedia.org/wiki/Web_scraping"
headers = {'User-agent':'Mozilla/5.0'}
page = requests.get(url, headers = headers)
soup = BeautifulSoup(page.text)
print soup.title

Then as the script gets validated and more complex I might put larger "single task actions" of code into functions:

from bs4 import BeautifulSoup
import requests

def make_soup(url):
    headers = {'User-agent':'Mozilla/5.0'}
    page = requests.get(url, headers = headers)
    soup = BeautifulSoup(page.text)
    return soup

def list_table_of_contents(soup):
    toc_elem = soup.find('div', id = 'toc')
    toc_level1 = toc_elem.find_all('li', class_ = 'toclevel-1')
    toc_level1_list = [i.text for i in toc_level1]
    return toc_level1_list

url = "https://en.wikipedia.org/wiki/Web_scraping"
soup = make_soup(url)
toc_level1_list = list_table_of_contents(soup)
for i in toc_level1_list:
    print i

The problem is once it gets to the point where I have 10 different functions in one script it becomes really hard to comprehend whats going on and do revisions. I'm guessing that using classes would be the next logical step, but I'm not sure how I would implement them in something like what I described above...

In my mind "line-by-line" programming is like writing a recipe, and using functions is like making factories that follow specific recipes (input -> output), so then what are classes?

From what I've read about classes and learned from playing with them, they are essentially a way to easily create multiple "objects" with specific attributes.

like image 231
ScrapeHeap Avatar asked Mar 01 '26 11:03

ScrapeHeap


1 Answers

an example class based implementation of the above. Hopefully it makes sense. Classes can make things easier as it allows you to abstract functionality and then inherit from one class or another. In the class below, we are inheriting from the base object class.

class MyBeautifulScraper(object):

    def __init__(self, site_to_scrape, headers={'User-agent':'Mozilla/5.0'}):
        self.site_to_scrape = site_to_scrape
        self.headers = headers
        self.soup = None

    def make_soup(self):
        page = requests.get(self.site_to_scrap, headers = self.headers)
        self.soup = BeautifulSoup(page.text)

    def get_title(self):
        return self.soup.title

    def list_table_of_contents(self):
        toc_elem = self.soup.find('div', id = 'toc')
        toc_level1 = toc_elem.find_all('li', class_ = 'toclevel-1')
        toc_level1_list = [i.text for i in toc_level1]
        return toc_level1_list

my_soup = MyBeautifulScraper("https://en.wikipedia.org/wiki/Web_scraping")
print my_soup.get_title()
toc_level1_list = my_soup.list_table_of_contents()

for i in toc_level1_list:
    print i

You could add functionality to the class without recreating it and simply by creating a specialized class that inherits from the above and then extending it with additional functionality:

class AnotherScraper(MyBeautifulScraper):

    def additional_functionality(self):
        ....

    # override existing functionality and make it do something different
    def get_title(self):
        return 'Title: {0}'.format(self.soup.title)

This is what makes Object Oriented Programming so powerful. The fact that you can re-use and extend pre-existing classes to inherit or override existing functionality without losing the initial functionality.

Hope this helps

like image 86
Incognos Avatar answered Mar 03 '26 23:03

Incognos