Scraping javascript-generated data using Python

Tags:

I want to scrape some data of following url using Python. http://www.hankyung.com/stockplus/main.php?module=stock&mode=stock_analysis_infomation&itemcode=078340

It's about a summary of company information.

What I want to scrape is not shown on the first page. By clicking tab named "재무제표", you can access financial statement. And clicking tab named "현금흐름표', you can access "Cash Flow".

I want to scrape the "Cash Flow" data.

However, Cash flow data is generated by javascript across the url. The following link is that url which is hidden, http://stock.kisline.com/compinfo/financial/main.action?vhead=N&vfoot=N&vstay=&omit=&vwidth=

Cash flow data is generated by submitting some option value and cookie to this url.

As you perceived, itemcode=078340 in the first link means stock code and there are as many as 1680 stocks that I want gather cash flow data. I want make it a loop structure.

Is there good way to scrape cash flow data? I tried scrapy but scrapy is difficult to cope with my another scraping code already I'm using.

259

asked Apr 07 '12 06:04

trigger

2 Answers

There's also dryscape (a library written by me, so the recommendation is a bit biased, obviously :) which uses a fast Webkit-based in-memory browser to navigate around. It understands Javascript, too, but is a lot more lightweight than Selenium.

157

answered Oct 02 '22 13:10

Niklas B.

If you need to scape the page content which is updated with AJAX and you are not in the control of this AJAX interface I would use Selenium browser automator for the task:

http://code.google.com/p/selenium/

Selenium has Python bindings
It launches a real browser instance so it can do and scrape 100% the same thing as you see with your own eyes
Get HTML document content after AJAX updates thru Selenium API
Use lxml + xpath / CSS selectors to parse out the relevant parts out of the document

answered Oct 02 '22 12:10

Mikko Ohtamaa

Related questions
                            
                                What is the best approach to search some text in body html
                            
                                Can I use Javascript to split up an MP3 file into smaller MP3 files?
                            
                                How to make position:fixed div scroll horizontally
                            
                                Can you bind .resize() to $(document) instead of $(window)?
                            
                                Using stringify from the v8 shell
                            
                                Generating IDs within Knockout Foreach loops
                            
                                Auto complete tag like Stack Overflow [duplicate]
                            
                                Difference between PHP regex and JavaScript regex
                            
                                converting image file to base64 String using javascript
                            
                                JQuery UI sortable in a scrollable container - scroll position "jumps" when sorting
                            
                                Uncaught RangeError: Maximum call stack size exceeded, JavaScript
                            
                                Jquery - Get Attribute using "this"
                            
                                What are the possible values for `jqXHR.status`?
                            
                                YouTube iFrame API .seekTo() not a method?
                            
                                Mobile webkit memory consumption
                            
                                Protect DIV element from being deleted within TinyMCE
                            
                                How can I detect when a google web font is ready and displayed in the page?
                            
                                Google Charts: Line Chart: Doesn't display in IE and Firefox, but works in chrome
                            
                                contenteditable div: IE8 not happy with backspace remove of HTML element
                            
                                Select multiple values in a multiple select box with Jquery

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Scraping javascript-generated data using Python

Tags:

python

javascript

web-scraping

screen-scraping

trigger

People also ask

2 Answers

Niklas B.

Mikko Ohtamaa

Recent Activity

Donate For Us