Let me start off by stating that I am completely self-taught in programming with python throught trial and error and a lot of googling, so please excuse my ignorance of propper programming terminology.
That being said, lets pretend I'm writing code that scrapes a website and returns a few pieces of information. In the very early stages of testing I would have the code writen "line-by-line", as in, outside of any function:
from bs4 import BeautifulSoup
import requests
url = "https://en.wikipedia.org/wiki/Web_scraping"
headers = {'User-agent':'Mozilla/5.0'}
page = requests.get(url, headers = headers)
soup = BeautifulSoup(page.text)
print soup.title
Then as the script gets validated and more complex I might put larger "single task actions" of code into functions:
from bs4 import BeautifulSoup
import requests
def make_soup(url):
headers = {'User-agent':'Mozilla/5.0'}
page = requests.get(url, headers = headers)
soup = BeautifulSoup(page.text)
return soup
def list_table_of_contents(soup):
toc_elem = soup.find('div', id = 'toc')
toc_level1 = toc_elem.find_all('li', class_ = 'toclevel-1')
toc_level1_list = [i.text for i in toc_level1]
return toc_level1_list
url = "https://en.wikipedia.org/wiki/Web_scraping"
soup = make_soup(url)
toc_level1_list = list_table_of_contents(soup)
for i in toc_level1_list:
print i
The problem is once it gets to the point where I have 10 different functions in one script it becomes really hard to comprehend whats going on and do revisions. I'm guessing that using classes would be the next logical step, but I'm not sure how I would implement them in something like what I described above...
In my mind "line-by-line" programming is like writing a recipe, and using functions is like making factories that follow specific recipes (input -> output), so then what are classes?
From what I've read about classes and learned from playing with them, they are essentially a way to easily create multiple "objects" with specific attributes.
an example class based implementation of the above. Hopefully it makes sense. Classes can make things easier as it allows you to abstract functionality and then inherit from one class or another. In the class below, we are inheriting from the base object class.
class MyBeautifulScraper(object):
def __init__(self, site_to_scrape, headers={'User-agent':'Mozilla/5.0'}):
self.site_to_scrape = site_to_scrape
self.headers = headers
self.soup = None
def make_soup(self):
page = requests.get(self.site_to_scrap, headers = self.headers)
self.soup = BeautifulSoup(page.text)
def get_title(self):
return self.soup.title
def list_table_of_contents(self):
toc_elem = self.soup.find('div', id = 'toc')
toc_level1 = toc_elem.find_all('li', class_ = 'toclevel-1')
toc_level1_list = [i.text for i in toc_level1]
return toc_level1_list
my_soup = MyBeautifulScraper("https://en.wikipedia.org/wiki/Web_scraping")
print my_soup.get_title()
toc_level1_list = my_soup.list_table_of_contents()
for i in toc_level1_list:
print i
You could add functionality to the class without recreating it and simply by creating a specialized class that inherits from the above and then extending it with additional functionality:
class AnotherScraper(MyBeautifulScraper):
def additional_functionality(self):
....
# override existing functionality and make it do something different
def get_title(self):
return 'Title: {0}'.format(self.soup.title)
This is what makes Object Oriented Programming so powerful. The fact that you can re-use and extend pre-existing classes to inherit or override existing functionality without losing the initial functionality.
Hope this helps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With