Scraping wikipedia table to pandas data frame

Tags:

I need to scrape a wikipedia table to a pandas data frame and create three columns: PostalCode, Borough, and Neighborhoods.

https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

Here is the code that I have used:

import requests
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
from bs4 import BeautifulSoup
soup = BeautifulSoup(website_url,'lxml')
print(soup.prettify())

My_table = soup.find('table',{'class':'wikitable sortable'})
My_table

links = My_table.findAll('a')
links

Neighbourhood = [ ]

for link in links:
    Neighbourhood.append(link.get('title'))

print (Neighbourhood)

import pandas as pd

df = pd.DataFrame([])
df['PostalCode', 'Borough', 'Neighborhood'] = Neighbourhood
df

And it returns that:

    (PostalCode, Borough, Neighborhood)
0   North York
1   Parkwoods
2   North York
3   Victoria Village
4   Downtown Toronto
5   Harbourfront (Toronto)
6   Downtown Toronto
7   Regent Park
8   North York

I can't figure out how to pick up the postcode and the neighbourhood from the wikipedia table.

Thank you

747

asked Feb 26 '19 12:02

Info Digitalevo

1 Answers

pandas allow you to do it in one line of code:

df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]

enter image description here

answered Oct 19 '22 15:10

Benoit de Menthière

Related questions
                            
                                Selenium Threads: how to run multi-threaded browser with proxy ( python)
                            
                                What is the recommended way to compute a weighted sum of selected columns of a pandas dataframe?
                            
                                How can I write a function fmap that returns the same type of iterable that was inputted?
                            
                                Django ImageField is not updating when update() method is used
                            
                                Regex to extract ONLY alphanumeric words
                            
                                How to copy only the changed file-contents on the already existed destination file?
                            
                                How to work around Out of bounds nanosecond [duplicate]
                            
                                Is it possible to expand the drawable area around the QSlider
                            
                                Error using HoughCircles with 3-channel input
                            
                                What is the difference between slicing in numpy arrays and slicing a list in Python?
                            
                                SQLAlchemy @property causes 'Unknown Field' error in Marshmallow with dump_only
                            
                                Convert a numpy array to iterator
                            
                                XOR-ing and Summing Two Black and White Images
                            
                                Type(1,) returns int expected tuple
                            
                                Keras: Difference between AveragePooling1D layer and GlobalAveragePooling1D layer
                            
                                Selenium Chrome save as pdf change download folder
                            
                                Pandas groupby then drop groups below specified size
                            
                                What is difference between JsonResponse and HttpResponse in django
                            
                                Docker "unsupported locale setting" when running Python container
                            
                                How to enable and disable the logarithmic scale as a viewer in Plotly?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scraping wikipedia table to pandas data frame

Tags:

python

pandas

wikipedia

Info Digitalevo

People also ask

1 Answers

Benoit de Menthière

Recent Activity

Donate For Us