tags", "text": "<p>I have some problem gettings all the html tags without <code><script></code> or <code><script ... /></code> using Xpath.</p>\n\n<p>For example, in this part of the HTML code, i want to remove : </p>\n\n<pre class="prettyprint"><code><script type="text/javascript" src="http://www.google.com/coop/cse/brand?form=cse-search-box&amp;lang=fr"/>\n</code></pre>\n\n<p>for this code</p>\n\n<pre class="prettyprint"><code><li><!-- Search Google -->\n<center>\n <form action="http://www.google.fr/cse" id="cse-search-box" target="_blank">\n <div>\n <input type="hidden" name="cx" value="partner-pub-0959382714089534:mw3ssl65jk1"/>\n <input type="hidden" name="ie" value="ISO-8859-1"/>\n <input type="text" name="q" size="31"/>\n <input type="submit" name="sa" value="Rechercher"/>\n </div>\n </form>\n <script type="text/javascript"\n src="http://www.google.com/coop/cse/brand?form=cse-search-box&amp;lang=fr"/>\n </center>\n <!-- Search Google --></li>\n</code></pre>\n\n<p>I'm generating an xml file using Web-Harvest, and then i have to remove some specifics tags. \nI have try a lot of xpath (i'm working in the body of the html) :</p>\n\n<ul>\n<li><p><code>//body//*[not(name() = 'script')]</code></p></li>\n<li><p><code>//body//*[not(self::script)]</code></p></li>\n<li><p><code>//body//*[not(starts-with(name(),'script'))]</code></p></li>\n<li><p><code>//body//*[not(contains(name(),'script'))]</code></p></li>\n</ul>\n<p>but it's not working.</p>\n\n<p>Note that <code>//body//*[name() = 'script']</code> is working, but i want the opposite... </p>\n\n<p>Do you have some ideas ?</p>\n\n<p>Or more generaly, if you know how to remove all the <code><script></code> <code><script/></code> tag using Xpath, i'm also interest in :-)</p>\n\n<p>Thanks in advance.</p>", "answerCount": 2, "upvoteCount": 958, "dateCreated": "2011-04-20 09:23:30", "dateModified": "2022-09-23 04:46:11", "author": { "type": "Person", "name": "jbed" }, "acceptedAnswer": { "@type": "Answer", "text": "<p>Well first of all XPath selects nodes in an existing document, it does not remove them. And your path <code>//body//*</code> you start with selects all child and descendant elements of the <code>body</code> element. Even if you now add a predicate like <code>//body//*[not(self::script)]</code> that path still selects elements like the <code>li</code> and the <code>center</code> element that are not themselves <code>script</code> elements but which contain a <code>script</code> element. So <code>//body//*[not(self::script)]</code> is the right approach not to select any non-<code>script</code> elements but it does not help if you want for instance the original <code>center</code> element with the <code>script</code> element being removed. That is not something pure XPath can do for you, you would need to move to XSLT to transform the document and that way remove any <code>script</code> elements.</p>", "upvoteCount": 115, "url": "https://exchangetuts.com/xpath-getting-all-tags-without-script-and-script-tags-1641288783917705#answer-1658508195186157", "dateCreated": "2022-09-18 04:46:11", "dateModified": "2022-09-23 04:46:11", "author": { "type": "Person", "name": "Martin Honnen" } }, "suggestedAnswer": [ { "@type": "Answer", "text": "<p><strong>XPath is just a <em>query</em> language for XML documents and as such it cannot alter in any way the XML document(s)</strong> that is being queried.</p>\n\n<p>The most convenient way to produce a new XML document that is different from the initial XML document is by using XSLT.</p>\n\n<p><strong>This short and simple XSLT transformation</strong>:</p>\n\n<pre class="prettyprint"><code><xsl:stylesheet version="1.0"\n xmlns:xsl="http://www.w3.org/1999/XSL/Transform">\n <xsl:output omit-xml-declaration="yes" indent="yes"/>\n <xsl:strip-space elements="*"/>\n\n <xsl:template match="node()|@*">\n <xsl:copy>\n <xsl:apply-templates select="node()|@*"/>\n </xsl:copy>\n </xsl:template>\n\n <xsl:template match="script"/>\n</xsl:stylesheet>\n</code></pre>\n\n<p><strong>when applied on the provided XML document:</strong></p>\n\n<pre class="prettyprint"><code><li>\n <!-- Search Google -->\n <center>\n <form action="http://www.google.fr/cse"\n id="cse-search-box" target="_blank">\n <div>\n <input type="hidden" name="cx"\n value="partner-pub-0959382714089534:mw3ssl65jk1"/>\n <input type="hidden" name="ie" value="ISO-8859-1"/>\n <input type="text" name="q" size="31"/>\n <input type="submit" name="sa" value="Rechercher"/>\n </div>\n </form>\n <script type="text/javascript"\n src="http://www.google.com/coop/cse/brand?form=cse-search-box&amp;lang=fr"/>\n </center>\n <!-- Search Google -->\n</li>\n</code></pre>\n\n<p><strong>produces the wanted, correct result</strong>:</p>\n\n<pre class="prettyprint"><code><li><!-- Search Google -->\n <center>\n <form action="http://www.google.fr/cse" id="cse-search-box" target="_blank">\n <div>\n <input type="hidden" name="cx" value="partner-pub-0959382714089534:mw3ssl65jk1"/>\n <input type="hidden" name="ie" value="ISO-8859-1"/>\n <input type="text" name="q" size="31"/>\n <input type="submit" name="sa" value="Rechercher"/>\n </div>\n </form>\n </center><!-- Search Google -->\n</li>\n</code></pre>", "upvoteCount": 38, "url": "https://exchangetuts.com/xpath-getting-all-tags-without-script-and-script-tags-1641288783917705#answer-1658508196628836", "dateCreated": "2022-09-16 04:46:11", "dateModified": "2022-09-23 04:46:11", "author": { "type": "Person", "name": "Dimitre Novatchev" } } ] } }