<blockquote> Possible Duplicate: Regular expression to remove HTML tags </blockquote> Is there an expression which will get the value between two HTML tags? Given this: <pre class="prettyprint"><code><td class="played">0</td> </code></pre> I am looking for an expression which will return <code>0</code>, stripping the <code><td></code> tags.

You should not attempt to parse HTML with regex. HTML is not a regular language, so any regex you come up with will likely fail on some esoteric edge case. Please refer to the seminal answer to this question for specifics. While mostly formatted as a joke, it makes a very good point. <hr> The following examples are Java, but the regex will be similar -- if not identical -- for other languages. <hr> <pre class="prettyprint"><code>String target = someString.replaceAll("<[^>]*>", ""); </code></pre> Assuming your non-html does not contain any < or > and that your input string is correctly structured. If you know they're a specific tag -- for example you know the text contains only <code><td></code> tags, you could do something like this: <pre class="prettyprint"><code>String target = someString.replaceAll("(?i)<td[^>]*>", ""); </code></pre> Edit: Ωmega brought up a good point in a comment on another post that this would result in multiple results all being squished together if there were multiple tags. For example, if the input string were <code><td>Something</td><td>Another Thing</td></code>, then the above would result in <code>SomethingAnother Thing</code>. In a situation where multiple tags are expected, we could do something like: <pre class="prettyprint"><code>String target = someString.replaceAll("(?i)<td[^>]*>", " ").replaceAll("\\s+", " ").trim(); </code></pre> This replaces the HTML with a single space, then collapses whitespace, and then trims any on the ends.

Regular expression to remove HTML tags from a string [duplicate]

Tags:

html

regex

Possible Duplicate:
Regular expression to remove HTML tags

Is there an expression which will get the value between two HTML tags?

Given this:

<td class="played">0</td>

I am looking for an expression which will return 0, stripping the <td> tags.

859

asked Jun 27 '12 15:06

danny

1 Answers

You should not attempt to parse HTML with regex. HTML is not a regular language, so any regex you come up with will likely fail on some esoteric edge case. Please refer to the seminal answer to this question for specifics. While mostly formatted as a joke, it makes a very good point.

The following examples are Java, but the regex will be similar -- if not identical -- for other languages.

String target = someString.replaceAll("<[^>]*>", "");

Assuming your non-html does not contain any < or > and that your input string is correctly structured.

If you know they're a specific tag -- for example you know the text contains only <td> tags, you could do something like this:

String target = someString.replaceAll("(?i)<td[^>]*>", "");

Edit: Ωmega brought up a good point in a comment on another post that this would result in multiple results all being squished together if there were multiple tags.

For example, if the input string were <td>Something</td><td>Another Thing</td>, then the above would result in SomethingAnother Thing.

In a situation where multiple tags are expected, we could do something like:

String target = someString.replaceAll("(?i)<td[^>]*>", " ").replaceAll("\\s+", " ").trim();

This replaces the HTML with a single space, then collapses whitespace, and then trims any on the ends.

126

answered Sep 21 '22 17:09

Roddy of the Frozen Peas

Related questions
                            
                                Django: How do I add arbitrary html attributes to input fields on a form?
                            
                                Disable color change of anchor tag when visited
                            
                                Applying a single font to an entire website with CSS
                            
                                CSS Cell Margin
                            
                                Angular2 - Input Field To Accept Only Numbers
                            
                                VSCode not auto completing HTML
                            
                                HTML for the Pause symbol in audio and video control
                            
                                How to set a Header field on POST a form?
                            
                                Footer at bottom of page or content, whichever is lower
                            
                                Bug With Firefox - Disabled Attribute of Input Not Resetting When Refreshing
                            
                                What is the current state of the art in HTML canvas JavaScript libraries and frameworks? [closed]
                            
                                HTML/CSS: Make a div "invisible" to clicks?
                            
                                ASP.Net: Literal vs Label
                            
                                How can I tell Google Translate to not translate a section of a website?
                            
                                How do you create a toggle button?
                            
                                How to vertically align into the center of the content of a div with defined width/height?
                            
                                CSS3 Flex: Pull child to the right
                            
                                Scope of sessionStorage and localStorage
                            
                                How to Add a Dotted Underline Beneath HTML Text
                            
                                How to Make A Chevron Arrow Using CSS?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With