I have a series of expressions such as: <pre class="prettyprint"><code>"the text I need to extract</a></div>" </code></pre> I need to extract the text between the <code></code> and <code></code> "symbols". This is, the result should be: <pre class="prettyprint"><code>"the text I need to extract" </code></pre> At the moment I am using gsub in R to manually remove all the symbols that are not text. However, I would like to use a regular expression to do the job. Does anyone know a regular expression to extract the between <code></code> and <code></code>? Thanks.

If this is html (which it look like it is) you should probably use an html parser. Package <code>XML</code> can do this <pre class="prettyprint"><code>library(XML) x <- "the text I need to extract</a></div>" xmlValue(getNodeSet(htmlParse(x), "//i")[[1]]) # [1] "the text I need to extract" </code></pre> On an entire html document, you can use <pre class="prettyprint"><code>doc <- htmlParse(x) sapply(getNodeSet(doc, "//i"), xmlValue) </code></pre>

Extract text between certain symbols using Regular Expression in R

Tags:

regex

r

I have a series of expressions such as:

"<i>the text I need to extract</i></b></a></div>"

I need to extract the text between the  and  "symbols". This is, the result should be:

"the text I need to extract"

At the moment I am using gsub in R to manually remove all the symbols that are not text. However, I would like to use a regular expression to do the job. Does anyone know a regular expression to extract the between  and ?

Thanks.

901

asked Nov 07 '14 20:11

Javier

1 Answers

If this is html (which it look like it is) you should probably use an html parser. Package XML can do this

library(XML)
x <- "<i>the text I need to extract</i></b></a></div>"
xmlValue(getNodeSet(htmlParse(x), "//i")[[1]])
# [1] "the text I need to extract"

On an entire html document, you can use

doc <- htmlParse(x)
sapply(getNodeSet(doc, "//i"), xmlValue)

157

answered Oct 14 '22 14:10

Rich Scriven

Related questions
                            
                                Regex: only alphanumeric but not if this is pure numeric
                            
                                How to convert a vector of strings to Title Case
                            
                                Regular Expression for domain from email address
                            
                                Replace non-numeric characters
                            
                                Regex to extract attribute value
                            
                                Bash Script Regular Expressions...How to find and replace all matches?
                            
                                Regex to match a specific group of digits of certain length?
                            
                                TextField Validation With Regular Expression
                            
                                Is regex in perl faster than in Java or other languages? [closed]
                            
                                What does .* do in regex?
                            
                                Can regex do this faster?
                            
                                Regular Expression | Leap Years and More
                            
                                Regex to match the URL last part with JavaScript
                            
                                find emails in a String [duplicate]
                            
                                How can I match a partial string to a database's object's attribute? Regexp?
                            
                                regex format string number with commas and 2 decimals in javascript
                            
                                What is proper RegEx expression for SWIFT codes?
                            
                                Split Java String into Two String using delimiter
                            
                                URI Regex: Replace http://, https://, ftp:// with empty string if URL valid
                            
                                remove all special characters in java [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With