Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you scrape an image from a website using Flutter?

Hi I'm trying to do a simple task of getting the img src url from a website but I can't seem to do it, I've tried various flutter packages and now I've reverted back to vanilla Flutter. This is my code:

onPressed: () async {
                http.Response response = await http.get('https://tiktok.com/@$enteredUsername');
                dom.Document document = parser.parse(response.body);
                final elements = document.getElementsByClassName('jsx-581822467');
                print(elements);
              },

I'm simply trying to get the image URL from this website (tiktok.com):

enter image description here

I've looked into the source code and it says the class name is 'jsx-581822467', but if I try to use that in the code it returns with a blank list.

enter image description here

How can I just simply get the URL of this profile picture? And the other elements with the 'jsx' prefix as their class names?

like image 819
KylianMbappe Avatar asked May 29 '20 17:05

KylianMbappe


People also ask

How do you scrape a picture with Octoparse?

1)Launch Octoparse. Enter the URL of the webpage we are scraping from. Then click the “Start” button to proceed. 2)As the page renders in the Octoparse, find the Tips Panel on the upper right and click “Auto-detect web page data” to proceed auto-detection.


1 Answers

I think I figured out what your problem is. The inspector of the web browser displays the HTML on a TikTok profile page. However, this is only generated with JavaScript once the page is loaded. If we download the content via http.get(), we get the raw HTML before JavaScript can do any changes.

  • Write http.get(), in front of your URL or right-click on the website and click on View Page Source. Now the HTML will be displayed in the same way as your app gets it.
  • Search for avatar-wrapper round. You won't be able to find it, because the tag from the profile picture doesn't exist here yet.
  • Fortunately, the URL of the profile picture is already included in other places. Search for <meta property="og:image" content=". You will find only one hit and after the hit the URL of the profile picture starts directly.

Therefore, in my opinion, the easiest way to get the URL is:

  1. download HTML.
  2. remove all text up to <meta property="og:image" content=".
  3. all following characters up to the next " are the URL we are looking for.

Here I have inserted my code, which worked fine for me:

Future<String> getProfileImageUrl(String username) async {
  // Download the content of the site
  http.Response response = await http.get("https://www.tiktok.com/@$username");
  String html = response.body;

  // The html contains the following string exactly one time.
  // After this specific string the url of the profile picture starts. 
  String needle = '<meta property="og:image" content="';
  int index = html.indexOf(needle);

  // The result of indexOf() equals -1 if the needle didn't occurred in the html.
  // In that case the received username may be invalid.
  if (index == -1)
    return null;

  // Remove all characters up to the start of the text snippet that we want.
  html = html.substring(html.indexOf(needle) + needle.length);

  // return all chars until the first occurrence of '"'
  return html.substring(0, html.indexOf('"'));
}

I hope that I could help you with my explanation.


Edit 1: General approach

  1. view page source to view HTML of the page
  2. search for the desired substring.
  3. Select the previous 10 to 15 characters and see how often this string occurs before.
  4. If it occurs more than once, you must call html = html.substring(html.indexOf(needle) + needle.length); accordingly often repeatedly.
  5. reload the page and check if it still works.
  6. now you have found your needle string.
like image 155
josxha Avatar answered Sep 29 '22 13:09

josxha