using the following documentation i have been trying to scrape a series of tables from marketwatch.com here is the one represented by the code bellow: <img src="https://i.stack.imgur.com/ak9iH.png" alt="enter image description here"> The link and xpath are already included in the code: <pre class="prettyprint"><code>url <- "http://www.marketwatch.com/investing/stock/IRS/profile" valuation <- url %>% html() %>% html_nodes(xpath='//*[@id="maincontent"]/div[2]/div[1]') %>% html_table() valuation <- valuation[[1]] </code></pre> I get the following error: <pre class="prettyprint"><code>Warning message: 'html' is deprecated. Use 'read_html' instead. See help("Deprecated") </code></pre> Thanks in advance.

That website doesn't use an html table, so <code>html_table()</code> can't find anything. It actaully uses <code>div</code> classes <code>column</code> and <code>data lastcolumn</code>. So you can do something like <pre class="prettyprint lang-r prettyprint-override"><code>url <- "http://www.marketwatch.com/investing/stock/IRS/profile" valuation_col <- url %>% read_html() %>% html_nodes(xpath='//*[@class="column"]') valuation_data <- url %>% read_html() %>% html_nodes(xpath='//*[@class="data lastcolumn"]') </code></pre> Or even <pre class="prettyprint lang-r prettyprint-override"><code>url %>% read_html() %>% html_nodes(xpath='//*[@class="section"]') </code></pre> To get you most of the way there. Please also read their terms of use - particularly 3.4.

How to scrape a table with rvest and xpath?

Tags:

r

web-scraping

xpath

rvest

using the following documentation i have been trying to scrape a series of tables from marketwatch.com

here is the one represented by the code bellow:

enter image description here

The link and xpath are already included in the code:

url <- "http://www.marketwatch.com/investing/stock/IRS/profile"
valuation <- url %>%
  html() %>%
  html_nodes(xpath='//*[@id="maincontent"]/div[2]/div[1]') %>%
  html_table()
valuation <- valuation[[1]]

I get the following error:

Warning message:
'html' is deprecated.
Use 'read_html' instead.
See help("Deprecated")

Thanks in advance.

980

asked Feb 29 '16 19:02

Alex Bădoi

1 Answers

That website doesn't use an html table, so html_table() can't find anything. It actaully uses div classes column and data lastcolumn.

So you can do something like

url <- "http://www.marketwatch.com/investing/stock/IRS/profile"
valuation_col <- url %>%
  read_html() %>%
  html_nodes(xpath='//*[@class="column"]')
    
valuation_data <- url %>%
  read_html() %>%
  html_nodes(xpath='//*[@class="data lastcolumn"]')

Or even

url %>%
  read_html() %>%
  html_nodes(xpath='//*[@class="section"]')

To get you most of the way there.

Please also read their terms of use - particularly 3.4.

answered Oct 07 '22 07:10

SymbolixAU

Related questions
                            
                                View markdown generated html in RStudio viewer
                            
                                mclapply long vectors not supported yet
                            
                                R merge without duplicating columns
                            
                                Importing data into R (rdata) from Github
                            
                                R set.seed() 's scope
                            
                                What is a parent promise?
                            
                                comfortable way to use unicode characters in a ggplot graph
                            
                                How can I load a specific version of R in linux?
                            
                                How to nest quantile() function within apply() function in R or RStudio
                            
                                Convert console output of list to a real R list
                            
                                treat string as object name in a loop in R
                            
                                How to generate a prediction interval from a regression tree rpart object?
                            
                                Apply paste over a list of vectors to get a list of strings
                            
                                Bars in geom_bar have unwanted different widths when using facet_wrap
                            
                                How to set same scales across different facets with ggpairs()
                            
                                Creating stand-alone Shiny App - Chrome Error
                            
                                Drawing manually on a figure
                            
                                ggplot2: Transparent legend background when stat_smooth is used
                            
                                Use infoBox from shinydashboard into shiny
                            
                                Polynomial regression in R - with extra constraints on the curve

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With