Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I scrape flash?

Tags:

I'd like to scrape a website to programmatically collect any external links within any flash elements on the page. I'd also like to collect any other text, if possible, but the links are the important part. Is this possible? A freeware library/service to accomplish this task would be preferable, but if none is, how can I accomplish the task on my own? Is it possible to get the source code and pull from that?

like image 893
Mike Pateras Avatar asked Feb 08 '10 17:02

Mike Pateras


People also ask

Is it legal to data scrape?

So is it legal or illegal? Web scraping and crawling aren't illegal by themselves. After all, you could scrape or crawl your own website, without a hitch. Startups love it because it's a cheap and powerful way to gather data without the need for partnerships.

Is it OK to scrape websites?

Good news for archivists, academics, researchers and journalists: Scraping publicly accessible data is legal, according to a U.S. appeals court ruling.

Is it legal to scrape images from websites?

Technically, you can't replicate any copyrighted content without permission or an appropriate license. Most information online, including website addresses, images, graphics, logos, and social media posts, has some copyright element. However, the US fair use doctrine permits scrapers to access copyrighted content.

What is web scraping used for?

Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database. The scraper can then replicate entire website content elsewhere.


2 Answers

Decompiling the Flash source would let you see the ActionScript part of the Flash file, which I've found to often contain info like links.

A free decompiler is Flare. It's command line only, and works fine. It won't decode some of the info in newer Flash formats (>CS3 I think). It dumps all the AS into one file.

Sothink SWF Decompiler is a more sophisticated commercial program. It will work fine with any Flash file I've tried and the results are quite thorough and well organized. it's GUI based and I don't know if it is easily automated.

With Flare, since it's a command line tool, one could easily write a script to obtain the SWF, decompile it, grep for 'http://', and log the results.

like image 97
JAL Avatar answered Sep 28 '22 08:09

JAL


Yanking "external links" out of a flash can be as simple as, for instance:

curl -s http://hostname/path/to/file.swf | strings | grep http

Of course, this'll fail if the author has taken any attempt to hide the URL.

YMMV a lot. Good luck!

like image 41
MikeyB Avatar answered Sep 28 '22 09:09

MikeyB