I've created a script to log in to linkedin using requests. The script is doing fine.
After logging in, I used this url https://www.linkedin.com/groups/137920/
to scrape this name Marketing Intelligence Professionals
from there which you can see in this image.
The script can parse the name flawlessly. However, what I wish to do now is scrape the link connected to the See all
button located at the bottom of that very page shown in this image.
Group link you gotta log in to access the content
I've created so far (it can scrape the name shown in the first image):
import json
import requests
from bs4 import BeautifulSoup
link = 'https://www.linkedin.com/login?fromSignIn=true&trk=guest_homepage-basic_nav-header-signin'
post_url = 'https://www.linkedin.com/checkpoint/lg/login-submit'
target_url = 'https://www.linkedin.com/groups/137920/'
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36'
r = s.get(link)
soup = BeautifulSoup(r.text,"lxml")
payload = {i['name']:i.get('value','') for i in soup.select('input[name]')}
payload['session_key'] = 'your email' #put your username here
payload['session_password'] = 'your password' #put your password here
r = s.post(post_url,data=payload)
r = s.get(target_url)
soup = BeautifulSoup(r.text,"lxml")
items = soup.select_one("code:contains('viewerGroupMembership')").get_text(strip=True)
print(json.loads(items)['data']['name']['text'])
How can I scrape the link connected to See all
button from there?
There is an internal Rest API which is called when you click on "See All" :
GET https://www.linkedin.com/voyager/api/search/blended
The keywords
query parameter contains the title of the group you have requested initially (the group title in the initial page).
In order to get the group name, you could scrape the html of the initial page, but there is an API which returns the group information when you gives the group ID :
GET https://www.linkedin.com/voyager/api/groups/groups/urn:li:group:GROUP_ID
The group id in your case is 137920 which can be extracted from the URL directly
An example :
import requests
from bs4 import BeautifulSoup
import re
from urllib.parse import urlencode
username = 'your username'
password = 'your password'
link = 'https://www.linkedin.com/login?fromSignIn=true&trk=guest_homepage-basic_nav-header-signin'
post_url = 'https://www.linkedin.com/checkpoint/lg/login-submit'
target_url = 'https://www.linkedin.com/groups/137920/'
group_res = re.search('.*/(.*)/$', target_url)
group_id = group_res.group(1)
with requests.Session() as s:
# login
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36'
r = s.get(link)
soup = BeautifulSoup(r.text,"lxml")
payload = {i['name']:i.get('value','') for i in soup.select('input[name]')}
payload['session_key'] = username
payload['session_password'] = password
r = s.post(post_url, data = payload)
# API
csrf_token = s.cookies.get_dict()["JSESSIONID"].replace("\"","")
r = s.get(f"https://www.linkedin.com/voyager/api/groups/groups/urn:li:group:{group_id}",
headers= {
"csrf-token": csrf_token
})
group_name = r.json()["name"]["text"]
print(f"searching data for group {group_name}")
params = {
"count": 10,
"keywords": group_name,
"origin": "SWITCH_SEARCH_VERTICAL",
"q": "all",
"start": 0
}
r = s.get(f"https://www.linkedin.com/voyager/api/search/blended?{urlencode(params)}&filters=List(resultType-%3EGROUPS)&queryContext=List(spellCorrectionEnabled-%3Etrue)",
headers= {
"csrf-token": csrf_token,
"Accept": "application/vnd.linkedin.normalized+json+2.1",
"x-restli-protocol-version": "2.0.0"
})
result = r.json()["included"]
print(result)
print("list of groupName/link")
print([
(t["groupName"], f'https://www.linkedin.com/groups/{t["objectUrn"].split(":")[3]}')
for t in result
])
A few notes :
application/vnd.linkedin.normalized+json+2.1
is necessary for the search callqueryContext
and filters
shouldn't be url encoded otherwise it will not take these params into accountIf you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With