Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Web scraping image inside canvas

I am web scraping a page where with various numbers appears also images of small price charts.

If I click on this images inside the browser I can save that chart as a .png image.

When I look at the source code that element looks like this when inspected:

<div class="performance_2d_sparkline graph ng-isolate-scope ng-scope" x-data-percent-change-day="ticker.pct_chge_1D" x-sparkline="watchlistData.sparklineData[ticker.ticker]">
  <span class="inlinesparkline ng-binding">
    <canvas width="100" height="40" style="display: inline-block; width: 100px; height: 40px; vertical-align: top;">
    </canvas>
  </span>
</div>

Is there any way I can save through web scraping the same images that I can save manually through the browser?

like image 753
user3755529 Avatar asked Jun 11 '17 15:06

user3755529


People also ask

Can you web scrape an image?

Downloading lots of images from a website can be quite time-consuming. Right-click, Save Image As…, repeat ad nauseam. In these cases, web scraping is the solution to your problem. In this tutorial, we will go over how to extract the URL for every image on a webpage using a free web scraper.

Can Jupyter Notebook be used for web scraping?

Web Scraping using Beautiful Soup. Using Jupyter Notebook, you should start by importing the necessary modules (pandas, numpy, matplotlib. pyplot, seaborn). If you don't have Jupyter Notebook installed, I recommend installing it using the Anaconda Python distribution which is available on the internet.

Is Python best for web scraping?

Python is the best language for web scraping, and this is evident in the way web scrapers are built and developed using the many Python tools. These Python web scraping tools generally boast high performance and are easy to code with simple and clear syntaxes.


1 Answers

If you are using Selenium for your web scraping, you can get the canvas element and save it to the image file using the following code snippet:

# get the base64 representation of the canvas image (the part substring(21) is for removing the padding "data:image/png;base64")
base64_image = driver.execute_script("return document.querySelector('.inlinesparkline canvas').toDataURL('image/png').substring(21);")

# decode the base64 image
output_image = base64.b64decode(base64_image)

# save to the output image
with open("image.png", 'wb') as f:
   f.write(output_image)
like image 145
htn Avatar answered Oct 30 '22 13:10

htn