There is a Git repository on GitHub called platform_frameworks_base containing part of the Android source code. I wrote an application that replies on all the .aidl files from that project, so it downloads them all on first start. Until now I did that by downloading the file Android.bp from the project root, extracting all file paths ending in .aidl from that file and then explicitly downloading them one by one. For example if I found this file path: <pre class="prettyprint"><code>media/java/android/media/IAudioService.aidl </code></pre> I knew I could download it like this: <pre class="prettyprint"><code>wget https://raw.githubusercontent.com/aosp-mirror/platform_frameworks_base/android-10.0.0_r47/media/java/android/media/IAudioService.aidl </code></pre> This works fine until Android 10 (git tag: <code>android-10.0.0_r47</code>). Starting with Android 11 (e.g. git tag: <code>android-11.0.0_r33</code>), the file paths use wildwards instead of complete paths. See this Android.bp. It now just contains wildcard/glob file paths like: <pre class="prettyprint"><code>media/java/**/*.aidl location/java/**/*.aidl </code></pre> etc... My current "solution": <ol> <li> Clone the repo (only the last commit of the branch we care about): <code>git clone --depth=1 -b android-11.0.0_r33 https://github.com/aosp-mirror/platform_frameworks_base.git</code> </li> <li> Extract the wildcard/glob paths from Android.bp. <code>cat Android.bp | grep '\.aidl"' | cut -d'"' -f2</code> </li> <li> Find all the files matching the wildcard/glob paths. e.g. <code>shopt -s globstar && echo media/java/**/*.aidl</code> </li> </ol> But the download process takes waaaaay to long because the repository contains over a gigabyte of binary files. Even if I just clone the last commit of the branch I care about. Now my actual question is either: How can I just download the <code>.aidl</code> files that I actually care about? (Ideally without parsing the HTML of every folder in GitHub.) Or How can I download/clone the repository without all the binary files? (probably not possible with git?) Edit: I tried using the GitHub API to recursively go through all directories, but I immediately get an API rate limit exceeded error: <pre class="prettyprint"><code>g_aidlFiles="" # Recursively go through all directories and the paths to all found .aidl files in the global g_aidlFile variable GetAidlFilesFromGithub() { l_dirUrl="${1-}" if [ "$l_dirUrl" == "" ]; then echo "ERROR: Directory URL not provided in GetAidlFilesFromGithub" exit 1 fi echo "l_dirUrl: ${l_dirUrl}" l_rawRes="$(curl -s -i $l_dirUrl)" l_statusCode="$(echo "$l_rawRes" | grep HTTP | head -1 | cut -d' ' -f2)" l_resBody="$(echo "$l_rawRes" | sed '1,/^\s*$/d')" if [[ $l_statusCode == 4* ]] || [[ $l_statusCode == 5* ]]; then echo "ERROR: Request failed!" echo "Response status: $l_statusCode" echo "Reponse body:" echo "$l_resBody" exit 1 fi l_currentDirJson="$(echo "$l_resBody")" if [ "$l_currentDirJson" == "" ]; then echo "ERROR: l_currentDirJson is empty" exit 1 fi l_newAidlFiles="$(echo "$l_currentDirJson" | jq '.[] | select(.type=="file") | select(.path | endswith(".aidl")) | .path')" if [ "$l_newAidlFiles" != "" ]; then echo "l_newAidlFiles: ${l_newAidlFiles}" g_aidlFiles="${g_aidlFiles}\n${l_newAidlFiles}" fi l_subDirUrls="$(echo "$l_currentDirJson" | jq '.[] | select(.type=="dir") | .url')" if [ "$l_subDirUrls" != "" ]; then echo "$l_subDirUrls" | while IFS= read -r l_subDirUrl ; do (GetAidlFilesFromGithub "$l_subDirUrl") done else echo "No subdirs found." fi } GetAidlFilesFromGithub "https://api.github.com/repos/aosp-mirror/platform_frameworks_base/contents?ref=android-11.0.0_r33" </code></pre> From what I understand all my users would have to create a GitHub account and create an OAUTH secret to raise the limit. That's definitely not an option for me. I want my application to be easy to use.

Since the repo's on GitHub, which supports filters, easiest is probably to use its filter support. <pre class="prettyprint"><code>git clone --no-checkout --depth=1 --filter=blob:none \ https://github.com/aosp-mirror/platform_frameworks_base cd platform_frameworks_base git reset -q -- \*.aidl git checkout-index -a </code></pre> which could probably be finessed quite a bit to get the files sent in a single pack instead of the one-at-a-time-fetch that produces. For instance, instead of <code>blob:none</code> say <code>blob:limit=16384</code>, that gets most of them up front. To do this in your own code, without relying on a Git install, you'd need to implement the git protocol. Here's the online intro with pointers to the actual Git docs. It's not hard, you send text lines back and forth until the server spits the gobsmacking lot of data you wanted, then you pick through it. You don't need to use https, github supports the plain git protocol. Try running that clone command with <code>GIT_TRACE=1 GIT_PACKET_TRACE=1</code>.

Clone/download specific files from a GitHub repository

Tags:

git

bash

github

glob

There is a Git repository on GitHub called platform_frameworks_base containing part of the Android source code.
I wrote an application that replies on all the .aidl files from that project, so it downloads them all on first start.
Until now I did that by downloading the file Android.bp from the project root, extracting all file paths ending in .aidl from that file and then explicitly downloading them one by one.

For example if I found this file path:

media/java/android/media/IAudioService.aidl

I knew I could download it like this:

wget https://raw.githubusercontent.com/aosp-mirror/platform_frameworks_base/android-10.0.0_r47/media/java/android/media/IAudioService.aidl

This works fine until Android 10 (git tag: android-10.0.0_r47).
Starting with Android 11 (e.g. git tag: android-11.0.0_r33), the file paths use wildwards instead of complete paths. See this Android.bp.

It now just contains wildcard/glob file paths like:

media/java/**/*.aidl
location/java/**/*.aidl

etc...

My current "solution":

Clone the repo (only the last commit of the branch we care about):

git clone --depth=1 -b android-11.0.0_r33 https://github.com/aosp-mirror/platform_frameworks_base.git
Extract the wildcard/glob paths from Android.bp.

cat Android.bp | grep '\.aidl"' | cut -d'"' -f2
Find all the files matching the wildcard/glob paths.

e.g. shopt -s globstar && echo media/java/**/*.aidl

But the download process takes waaaaay to long because the repository contains over a gigabyte of binary files. Even if I just clone the last commit of the branch I care about.

Now my actual question is either:
How can I just download the .aidl files that I actually care about? (Ideally without parsing the HTML of every folder in GitHub.)
Or
How can I download/clone the repository without all the binary files? (probably not possible with git?)

Edit:

I tried using the GitHub API to recursively go through all directories, but I immediately get an API rate limit exceeded error:

g_aidlFiles=""

# Recursively go through all directories and the paths to all found .aidl files in the global g_aidlFile variable
GetAidlFilesFromGithub() {
    l_dirUrl="${1-}"
    if [ "$l_dirUrl" == "" ]; then
        echo "ERROR: Directory URL not provided in GetAidlFilesFromGithub"
        exit 1
    fi
    
    echo "l_dirUrl: ${l_dirUrl}"
    
    l_rawRes="$(curl -s -i $l_dirUrl)"
    l_statusCode="$(echo "$l_rawRes" | grep HTTP | head -1 | cut -d' ' -f2)"
    l_resBody="$(echo "$l_rawRes" | sed '1,/^\s*$/d')"
    if [[ $l_statusCode == 4* ]] || [[ $l_statusCode == 5* ]]; then
        echo "ERROR: Request failed!"
        echo "Response status: $l_statusCode"
        echo "Reponse body:"
        echo "$l_resBody"
        exit 1
    fi
    
    l_currentDirJson="$(echo "$l_resBody")"
    if [ "$l_currentDirJson" == "" ]; then
        echo "ERROR: l_currentDirJson is empty"
        exit 1
    fi
    
    l_newAidlFiles="$(echo "$l_currentDirJson" | jq '.[] | select(.type=="file") | select(.path | endswith(".aidl")) | .path')"
    
    if [ "$l_newAidlFiles" != "" ]; then
        echo "l_newAidlFiles: ${l_newAidlFiles}"
        g_aidlFiles="${g_aidlFiles}\n${l_newAidlFiles}"
    fi

    l_subDirUrls="$(echo "$l_currentDirJson" | jq '.[] | select(.type=="dir") | .url')"
    if [ "$l_subDirUrls" != "" ]; then
        echo "$l_subDirUrls" | while IFS= read -r l_subDirUrl ; do 
            (GetAidlFilesFromGithub "$l_subDirUrl")
        done
    else
        echo "No subdirs found."
    fi
}

GetAidlFilesFromGithub "https://api.github.com/repos/aosp-mirror/platform_frameworks_base/contents?ref=android-11.0.0_r33"

From what I understand all my users would have to create a GitHub account and create an OAUTH secret to raise the limit. That's definitely not an option for me. I want my application to be easy to use.

426

asked Mar 12 '21 13:03

Forivin

4 Answers

Since the repo's on GitHub, which supports filters, easiest is probably to use its filter support.

git clone --no-checkout --depth=1 --filter=blob:none \
        https://github.com/aosp-mirror/platform_frameworks_base
cd platform_frameworks_base
git reset -q -- \*.aidl
git checkout-index -a

which could probably be finessed quite a bit to get the files sent in a single pack instead of the one-at-a-time-fetch that produces.

For instance, instead of blob:none say blob:limit=16384, that gets most of them up front.

To do this in your own code, without relying on a Git install, you'd need to implement the git protocol. Here's the online intro with pointers to the actual Git docs. It's not hard, you send text lines back and forth until the server spits the gobsmacking lot of data you wanted, then you pick through it. You don't need to use https, github supports the plain git protocol. Try running that clone command with GIT_TRACE=1 GIT_PACKET_TRACE=1.

125

answered Oct 22 '22 02:10

jthill

Not sure if this is what you wanted :

#!/usr/bin/env bash
  
get_github_file_list(){
    local user=$1 repo=$2 branch=$3
    curl -s "https://api.github.com/repos/$user/$repo/git/trees/$branch?recursive=1"
}

get_github_file_list aosp-mirror platform_frameworks_base android-11.0.0_r33 |\
    jq -r '.tree|map(.path|select(test("\\.aidl")))[]'

answered Oct 22 '22 02:10

Philippe

You could use GitHub API code search endpoint to get the paths, but then use your wget raw.githubusercontent method to download them:

apiurlbase='https://api.github.com/search/code?per_page=100&q=repo:aosp-mirror/platform_frameworks_base+extension:aidl'
dlurlbase='https://raw.githubusercontent.com/aosp-mirror/platform_frameworks_base/android-10.0.0_r47/'
apiurl1="$apiurlbase+path:/media/java/"
apiurl2="$apiurlbase+path:/location/java/"
for apiurl in "$apiurl1" "$apiurl2"; do
  page=1
  while paths=$(
    curl -s "$apiurl&page=$page" | grep '"path": ' | grep -o '[^"]\+\.aidl'
  ); do
    # do your stuff with the $paths
    page=$(($page + 1))
  done
done

Unfortunately, the GitHub API code search endpoint only searches the default branch (in this case, master), whereas you want the android-10.0.0_r47 tag. There could be files in android-10.0.0_r47 but not in master, and this code won't find and download these.

An alternative solution is to do a very minimal clone of each tag you're interested in, and then use git ls-tree to get the paths of each tag, e.g.,

for tag in 'android-10.0.0_r47' 'android-11.0.0_r33'; do
  git clone --branch "$tag" --depth=1 --bare --no-checkout \
    --filter=blob:limit=0 [email protected]:aosp-mirror/platform_frameworks_base.git
  # only a 1.8M download
  mv platform_frameworks_base.git "$tag"
  cd "$tag"
  paths=$(git ls-tree -r HEAD --name-only | grep '\.aidl$')
  # do your stuff with the paths
  cd ..
done

If this is for own use, I probably wouldn't use either of these methods. I would just clone the entire huge repo once and then work with it locally, e.g.,

if [ -e platform_frameworks_base ]; then
  cd platform_frameworks_base
  git pull
else
  git clone [email protected]:aosp-mirror/platform_frameworks_base.git
  cd platform_frameworks_base
fi
tags=$(git tag | grep '^android')
for tag in $tags; do
  git checkout $tag
  paths=$(git ls-tree -r HEAD --name-only | grep '\.aidl$')
  # do your stuff with the paths
done

answered Oct 22 '22 01:10

webb

Give the circumstances I would maintain a text file that is automatically updated with the latest repo file tree before each commit.

The script should be easy to write and be fast to run since all this is happening locally. It can be called manually by introducing a new work process or be integrated into your test/CI automation process.

Then you know what to do in your end-user application, download this file first, filter it out with the Android.bp, then extract the files you want with the Github raw content links.

answered Oct 22 '22 02:10

alex

Related questions
                            
                                Running Docker on Mac, build works, run errors: : /bin/sh: 1: [: missing ]
                            
                                Is it possible to use $array=() in bash?
                            
                                find file names that return success on -exec command
                            
                                execute binary directly from curl
                            
                                How do I know if a bash script is running with nohup?
                            
                                jekyll serve and launch
                            
                                Function name valid in bash but not in sh [duplicate]
                            
                                Bash script to remove dot end of each line
                            
                                Why does `ls | cat` != `ls`?
                            
                                Properly using curl --data-urlencode when passing a variable
                            
                                In bash, how to make a comparison and assign to variable
                            
                                curl command output has wrong encoding
                            
                                sudoers NOPASSWD: sudo: no tty present and no askpass program specified
                            
                                Send message to a Python Script
                            
                                What is Fish equivalent for <<EOF in Bash
                            
                                Heredoc on docker exec
                            
                                Bash associative array with list as value
                            
                                Read file into variable while suppressing "No such file or directory" error
                            
                                Iterate over json with jq
                            
                                Delete a specific line from a 12GB file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Clone/download specific files from a GitHub repository

Tags:

git

bash

github

glob

Forivin

People also ask

4 Answers

jthill

Philippe

webb

alex

Recent Activity

Donate For Us