XPATH getting all tags without <script> and </script> tags

Q: What is the purpose of the script and</ script tags?

The <script> tag in HTML is used to define the client-side script. The <script> tag contains the scripting statements, or it points to an external script file. The JavaScript is mainly used in form validation, dynamic changes of content, image manipulation, etc.

Tags:

html

xpath

I have some problem gettings all the html tags without <script> or <script ... /> using Xpath.

For example, in this part of the HTML code, i want to remove :

<script type="text/javascript" src="http://www.google.com/coop/cse/brand?form=cse-search-box&amp;lang=fr"/>

for this code

<li><!-- Search Google -->
<center>
                     <form action="http://www.google.fr/cse" id="cse-search-box" target="_blank">
                        <div>
                           <input type="hidden" name="cx" value="partner-pub-0959382714089534:mw3ssl65jk1"/>
                           <input type="hidden" name="ie" value="ISO-8859-1"/>
                           <input type="text" name="q" size="31"/>
                           <input type="submit" name="sa" value="Rechercher"/>
                        </div>
                     </form>
                     <script type="text/javascript"
                             src="http://www.google.com/coop/cse/brand?form=cse-search-box&amp;lang=fr"/>
                  </center>
                  <!-- Search Google --></li>

I'm generating an xml file using Web-Harvest, and then i have to remove some specifics tags. I have try a lot of xpath (i'm working in the body of the html) :

//body//*[not(name() = 'script')]
//body//*[not(self::script)]
//body//*[not(starts-with(name(),'script'))]
//body//*[not(contains(name(),'script'))]

but it's not working.

Note that //body//*[name() = 'script'] is working, but i want the opposite...

Do you have some ideas ?

Or more generaly, if you know how to remove all the <script> <script/> tag using Xpath, i'm also interest in :-)

Thanks in advance.

958

asked Apr 20 '11 09:04

jbed

2 Answers

Well first of all XPath selects nodes in an existing document, it does not remove them. And your path //body//* you start with selects all child and descendant elements of the body element. Even if you now add a predicate like //body//*[not(self::script)] that path still selects elements like the li and the center element that are not themselves script elements but which contain a script element. So //body//*[not(self::script)] is the right approach not to select any non-script elements but it does not help if you want for instance the original center element with the script element being removed. That is not something pure XPath can do for you, you would need to move to XSLT to transform the document and that way remove any script elements.

115

answered Sep 23 '22 04:09

Martin Honnen

XPath is just a query language for XML documents and as such it cannot alter in any way the XML document(s) that is being queried.

The most convenient way to produce a new XML document that is different from the initial XML document is by using XSLT.

This short and simple XSLT transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="script"/>
</xsl:stylesheet>

when applied on the provided XML document:

<li>
    <!-- Search Google -->
    <center>
        <form action="http://www.google.fr/cse"
              id="cse-search-box" target="_blank">
            <div>
                <input type="hidden" name="cx"
                value="partner-pub-0959382714089534:mw3ssl65jk1"/>
                <input type="hidden" name="ie" value="ISO-8859-1"/>
                <input type="text" name="q" size="31"/>
                <input type="submit" name="sa" value="Rechercher"/>
            </div>
        </form>
        <script type="text/javascript"
        src="http://www.google.com/coop/cse/brand?form=cse-search-box&amp;lang=fr"/>
    </center>
    <!-- Search Google -->
</li>

produces the wanted, correct result:

<li><!-- Search Google -->
   <center>
      <form action="http://www.google.fr/cse" id="cse-search-box" target="_blank">
         <div>
            <input type="hidden" name="cx" value="partner-pub-0959382714089534:mw3ssl65jk1"/>
            <input type="hidden" name="ie" value="ISO-8859-1"/>
            <input type="text" name="q" size="31"/>
            <input type="submit" name="sa" value="Rechercher"/>
         </div>
      </form>
   </center><!-- Search Google -->
</li>

answered Sep 23 '22 04:09

Dimitre Novatchev

Related questions
                            
                                Create clean simple HTML/CSS using best practice examples [closed]
                            
                                "Whatever is Left" in a CSS layout
                            
                                Is putting multiple submit buttons on an HTML form bad practice?
                            
                                Why is there no common HTML5 video codec standard for all browsers?
                            
                                getting video dimensions before upload, client-side
                            
                                best HTML validator/parser?
                            
                                Fancy box not opening iFrame second time
                            
                                fade effect using javascript no jquery?
                            
                                Using :hover for an element's inline style (using HTML/CSS/php) [duplicate]
                            
                                css selectors query
                            
                                Change columnspan using CSS
                            
                                Audio recording with HTML5 and Javascript
                            
                                How does the github UI navigate directories without postbacks?
                            
                                Centering a fixed element, but scroll it horizontally
                            
                                Button Hover Sound using HTML5 Audio
                            
                                Is it possible to create an iframe on click of a button using javascript
                            
                                iPhone converts dates on my website into phone numbers — how can I prevent this?
                            
                                Is it possibly to create selectable hyperlink with basic Swing components in Java?
                            
                                What HTTP headers/responses trigger the "onerror" handler on a script tag?
                            
                                Generate HTML Static Pages from Dynamic Php Pages

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With