Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use GitHub API to get a repository's dependents information in GitHub?

When I was using GitHub API v4 to get some information, I can easily get dependencies by using repository.dependencyGraphManifests. But I can't find any way to use GitHub API v4 to get the dependents information, though I can see it in the Insights->Dependency Graph->Dependents. I want to know if there is any possible way to get the dependents information in a GitHub repository? Whether GitHub API or something else.

like image 468
HELLORPG Avatar asked Nov 06 '19 16:11

HELLORPG


4 Answers

I don't think you can get the dependents project using Github API (Rest or Graphql), one way could be to use scraping like the following python script :

import requests
from bs4 import BeautifulSoup

repo = "expressjs/express"
page_num = 3
url = 'https://github.com/{}/network/dependents'.format(repo)

for i in range(page_num):
    print("GET " + url)
    r = requests.get(url)
    soup = BeautifulSoup(r.content, "html.parser")

    data = [
        "{}/{}".format(
            t.find('a', {"data-repository-hovercards-enabled":""}).text,
            t.find('a', {"data-hovercard-type":"repository"}).text
        )
        for t in soup.findAll("div", {"class": "Box-row"})
    ]

    print(data)
    print(len(data))
    paginationContainer = soup.find("div", {"class":"paginate-container"}).find('a')
    if paginationContainer:
        url = paginationContainer["href"]
    else:
        break

Try this python script

like image 131
Bertrand Martel Avatar answered Oct 09 '22 23:10

Bertrand Martel


Based on Bertrand Martel's answer (@bertrand-martel), do not forget to add the following code so that you are not stuck between 1st and 2nd pages. In other words, it will be going forward, and then backward; because there is initially only one <a> tag, whereas the next page has two of these, so it chooses 1st one ("previous") and returns to the previous page.

Code:

...
    paginationContainer = soup.find("div", {"class":"paginate-container"}).find_all('a')
    if len(paginationContainer) > 1:
        paginationContainer = paginationContainer[1]
    else:
        paginationContainer = paginationContainer[0]
...
like image 41
oneturkmen Avatar answered Oct 09 '22 23:10

oneturkmen


Building on top of @Bertrand Martel 's answer, the following is a version of his code that does not require knowing the page_num beforehand:

import requests
from bs4 import BeautifulSoup

repo = "expressjs/express"
url = 'https://github.com/{}/network/dependents'.format(repo)
nextExists = True
result = []
while nextExists:
    r = requests.get(url)
    soup = BeautifulSoup(r.content, "html.parser")

    result = result + [
        "{}/{}".format(
            t.find('a', {"data-repository-hovercards-enabled":""}).text,
            t.find('a', {"data-hovercard-type":"repository"}).text
        )
        for t in soup.findAll("div", {"class": "Box-row"})
    ]
    nextExists = False
    for u in soup.find("div", {"class":"paginate-container"}).findAll('a'):
        if u.text == "Next":
            nextExists = True
            url = u["href"]

for r in result:
  print(r)
print(len(result))

Keep in mind that it can run very long if there are many dependents.

like image 36
muvaf Avatar answered Oct 09 '22 21:10

muvaf


A ruby script (similar to the python accepted answer) that lists repos with their stars and forks. The script either returns a json array if piped, or yields a ruby repl otherwise.

# frozen_string_literal: true

require 'json'
require 'nokogiri'
require 'open-uri'

$repo = ARGV.fetch(0, "rgeo/rgeo")

Repo = Struct.new(:org, :repo, :stars, :forks)

url = "https://github.com/#$repo/network/dependents"
repos = []

while url
  doc = Nokogiri::HTML(URI.open(url))
  doc.css('#dependents .Box .Box-row').each do |el|
    repos << Repo.new(
      *el.css('.f5 > a').map(&:inner_text),
      *el.at_css('.d-flex').content.delete(" ,").scan(/\d+/).map(&:to_i)
    )
rescue
  binding.irb
  end
  url = doc.at_css('.paginate-container > .BtnGroup > .BtnGroup-item:nth-child(2)').attr("href")
end

if $stdin.tty? && $stdout.tty?
  # check `repos`
  binding.irb
else
  jj repos.map { { name: "#{_1.org}/#{_1.repo}", stars: _1.stars, forks: _1.forks } }
end
like image 30
Ulysse BN Avatar answered Oct 09 '22 21:10

Ulysse BN