Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scraping wikipedia table to pandas data frame

I need to scrape a wikipedia table to a pandas data frame and create three columns: PostalCode, Borough, and Neighborhoods.

https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

Here is the code that I have used:

import requests
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,'lxml')
print(soup.prettify())

My_table = soup.find('table',{'class':'wikitable sortable'})
My_table

links = My_table.findAll('a')
links

Neighbourhood = [ ]

for link in links:
    Neighbourhood.append(link.get('title'))

print (Neighbourhood)

import pandas as pd

df = pd.DataFrame([])
df['PostalCode', 'Borough', 'Neighborhood'] = Neighbourhood
df

And it returns that:

    (PostalCode, Borough, Neighborhood)
0   North York
1   Parkwoods
2   North York
3   Victoria Village
4   Downtown Toronto
5   Harbourfront (Toronto)
6   Downtown Toronto
7   Regent Park
8   North York

I can't figure out how to pick up the postcode and the neighbourhood from the wikipedia table.

Thank you

like image 747
Info Digitalevo Avatar asked Feb 26 '19 12:02

Info Digitalevo


People also ask

Is scraping Wikipedia legal?

The scraping itself is legal, sure. All Wikipedia text is available under the Creative Commons Attribution-ShareAlike License (CC-BY-SA). So long as any reuse follows the terms of that license, that reuse is also legal.


1 Answers

pandas allow you to do it in one line of code:

df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]

enter image description here

like image 71
Benoit de Menthière Avatar answered Oct 19 '22 15:10

Benoit de Menthière