Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I retrieve Youtube's autocomplete results using Jsoup (Java)?

Youtube autocomplete

As shown in this image I want to retrieve autocomplete search results using Jsoup. I'm already retrieving the video URL, video title and thumbnail using the video id, but I am stuck at retrieving them from the search results.

I have to complete this without using Youtube's Data Api and only using Jsoup.

Any suggestions that can point me in the right direction would be appreciated.

like image 741
raj kavadia Avatar asked Mar 03 '23 07:03

raj kavadia


1 Answers

The search results are generated dynamically, via JavaScript. That means that they can not be parsed by Jsoup, because Jsoup only "sees" the static code embedded in the page. However, we can get the results directly from the API.

YouTube's autocomplete search results are aquired from a web service (provided by Google). Every time we add a letter in the search bar, in the background, a request is made to that service and the response is rendered on the page. We can discover such APIs with the Developer Tools of a browser. For example, I found this API with the following procedure:

  • Open YouTube in a browser.
  • Open the Developer Console. (Ctrl + Shift + I).
  • Go to the Network tab. Here we can find detailed information about our browser's connections to web-servers.
  • Add a letter in YouTube's search bar. At this point, we can see a new GET request to https://clients1.google.com/complete/search.
  • Click on that request and go to the box on the right, where we can examine the request-response more carefully. In the Headers tab, we see that the URL contains our search query; in the Response tab, the response body contains the autocomplete results.

The response is a JavaScript snippet that contains our data in an array, and it can be parsed with Regular expressions. Jsoup can be used for the HTTP request, but any HTTP client will do.

public static ArrayList<String> autocompleteResults(String query) 
        throws IOException, UnsupportedEncodingException, PatternSyntaxException {
    String url = "https://clients1.google.com/complete/search?client=youtube&hl=en&gs_rn=64&gs_ri=youtube&ds=yt&cp=10&gs_id=b2&q=";
    String re = "\\[\"(.*?)\",";

    Response resp = Jsoup.connect(url + URLEncoder.encode(query, "UTF-8")).execute();
    Matcher match = Pattern.compile(re, Pattern.DOTALL).matcher(resp.body());

    ArrayList<String> data = new ArrayList<String>();
    while (match.find()) {
        data.add(match.group(1));
    }
    return data;
}

The code provided was created and tested on VScode, Java8, Windows, but it should also work on Android Studio.

like image 185
t.m.adam Avatar answered Apr 07 '23 00:04

t.m.adam