I'm looking a regular expression which must extract text between HTML tag of different types.
For ex:
<span>Span 1</span>
- O/p: Span 1
<div onclick="callMe()">Span 2</div>
- O/p: Span 2
<a href="#">HyperText</a>
- O/p: HyperText
I found this particular piece <([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1>
from here But this one is not working.
Your comment shows that you have neglected to escape the backslashes in your regex string.
And if you want to match lowercase letters add a-z
to the character classes or use Pattern.CASE_INSENSITIVE
(or add (?i)
to the beginning of the regex)
"<([A-Za-z][A-Za-z0-9]*)\\b[^>]*>(.*?)</\\1>"
If the tag contents may contain newlines, then use Pattern.DOTALL
or add (?s)
to the beginning of the regex to turn on dotall/singleline mode.
This should suit your needs:
<([a-zA-Z]+).*?>(.*?)</\\1>
The first group contains the tag name, the second one the value inbetween.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With