Background Lately I've been trying to get more familiar with the concept of changing a delimited string into an XML to parse with Excel's <code>FILTERXML</code> and retrieve those substrings that are of interest. Please note that this function came available from Excel 2013 and is not available on Excel for Mac nor Excel Online. With a delimited string, I meant anything from a normal sentence using spaces as delimiters or any other combination of characters that could be used to define substrings within a string. For example let's imagine the following: <pre class="prettyprint"><code>ABC|123|DEF|456|XY-1A|ZY-2F|XY-3F|XY-4f|xyz|123 </code></pre> <hr> Question So, where a lot of people know how to get the nth element (e.g.: <code>=TRIM(MID(SUBSTITUTE(A1,"|",REPT(" ",LEN(A1))),3*LEN(A1)+1,LEN(A1)))</code> to retrieve <code>456</code>). Or other combinationes with <code>LEN()</code>, <code>MID()</code>, <code>FIND()</code> and all those constructs, how do we use <code>FILTERXML</code> to use more specific criteria to extract substrings of concern and clean up the full string? For example, how to retrieve: <ul> <li>elements by position</li> <li>numeric or non-numeric elements</li> <li>elements that contain a substring on their own</li> <li>elements that start or end with a substring</li> <li>elements that are upper- or lowercase</li> <li>elements holding numbers</li> <li>unique values</li> <li>... </li> </ul>

Excel's <code>FILTERXML</code> uses <code>XPATH 1.0</code> which unfortunately means it is not as diverse as we would maybe want it to be. Also, Excel seems to not allow returning reworked node values and exclusively allows you to select nodes in order of appearance. However there is a fair share of functions we can still utilize. More information about that can be found here. The function takes two parameters: <code>=FILTERXML(<A string in valid XML format>,<A string in valid XPATH format>)</code> Let's say cell <code>A1</code> holds the string: <code>ABC|123|DEF|456|XY-1A|ZY-2F|XY-3F|XY-4f|xyz|123</code>. To create a valid XML string we use <code>SUBSTITUTE</code> to change the delimiter to valid end- and start-tag constructs. So to get a valid XML construct for the given example we could do: <code>"<t><s>"&SUBSTITUTE(A1,"|","</s><s>")&"</s></t>"</code> For readability reasons I'll refer to the above construct with the word <code><XML></code> as a placeholder. Below you'll find different usefull <code>XPATH</code> functions in a valid construct to filter nodes: <hr> 1) All Elements: <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s") </code></pre> Returns: <code>ABC</code>, <code>123</code>, <code>DEF</code>, <code>456</code>, <code>XY-1A</code>, <code>ZY-2F</code>, <code>XY-3F</code>, <code>XY-4f</code>, <code>xyz</code> and <code>123</code> (all nodes) <hr> 2) Elements by position: <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[position()=4]") </code></pre> Or: <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[4]") </code></pre> Returns: <code>456</code> (node on index 4) <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[position()<4]") </code></pre> Returns: <code>ABC</code>, <code>123</code> and <code>DEF</code> (nodes on index < 4) <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[position()=2 or position()>5]") </code></pre> Returns: <code>123</code>, <code>ZY-2F</code>, <code>XY-3F</code>, <code>XY-4f</code>, <code>xyz</code> and <code>123</code> (nodes on index 2 or > 5) <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[last()]") </code></pre> Returns: <code>123</code> (node on last index) <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[position() mod 2 = 1]") </code></pre> Returns: <code>ABC</code>, <code>DEF</code>, <code>XY-1A</code>, <code>XY-3F</code> and <code>xyz</code> (odd nodes) <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[position() mod 2 = 0]") </code></pre> Returns: <code>123</code>, <code>456</code>, <code>ZF-2F</code>, <code>XY-4f</code> and <code>123</code> (even nodes) <hr> 3) (Non) numeric elements: <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[number()=.]") </code></pre> Or: <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[.*0=0]") </code></pre> Returns: <code>123</code>, <code>456</code>, and <code>123</code> (numeric nodes) <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[not(number()=.)]") </code></pre> Or: <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[.*0!=0)]") </code></pre> Returns: <code>ABC</code>, <code>DEF</code>, <code>XY-1A</code>, <code>ZY-2F</code>, <code>XY-3F</code>, <code>XY-4f</code> and <code>xyz</code> (non-numeric nodes) <hr> 4) Elements that (not) contain: <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[contains(., 'Y')]") </code></pre> Returns: <code>XY-1A</code>, <code>ZY-2F</code>, <code>XY-3F</code> and <code>XY-4f</code> (containing 'Y', notice <code>XPATH</code> is case sensitive, exclusing <code>xyz</code>) <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[not(contains(., 'Y'))]") </code></pre> Returns: <code>ABC</code>, <code>123</code>, <code>DEF</code>, <code>456</code>, <code>xyz</code> and <code>123</code> (not containing 'Y', notice <code>XPATH</code> is case sensitive, including <code>xyz</code>) <hr> 5) Elements that (not) start or/and end with: <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[starts-with(., 'XY')]") </code></pre> Returns: <code>XY-1A</code>, <code>XY-3F</code> and <code>XY-4f</code> (starting with 'XY') <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[not(starts-with(., 'XY'))]") </code></pre> Returns: <code>ABC</code>, <code>123</code>, <code>DEF</code>, <code>456</code>, <code>ZY-2F</code>, <code>xyz</code> and <code>123</code> (don't start with 'XY') <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[substring(., string-length(.) - string-length('F') +1) = 'F']") </code></pre> Returns: <code>DEF</code>, <code>ZY-2F</code> and <code>XY-3F</code> (end with 'F', notice <code>XPATH 1.0</code> does not support <code>ends-with</code>) <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[not(substring(., string-length(.) - string-length('F') +1) = 'F')]") </code></pre> Returns: <code>ABC</code>, <code>123</code>, <code>456</code>, <code>XY-1A</code>, <code>XY-4f</code>, <code>xyz</code> and <code>123</code> (don't end with 'F') <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[starts-with(., 'X') and substring(., string-length(.) - string-length('A') +1) = 'A']") </code></pre> Returns: <code>XY-1A</code> (start with 'X' and end with 'A') <hr> 6) Elements that are upper- or lowercase: <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[translate(.,'abcdefghijklmnopqrstuvwxyz','ABCDEFGHIJKLMNOPQRSTUVWXYZ')=.]") </code></pre> Returns: <code>ABC</code>, <code>123</code>, <code>DEF</code>, <code>456</code>, <code>XY-1A</code>, <code>ZY-2F</code>, <code>XY-3F</code> and <code>123</code> (uppercase nodes) <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[translate(.,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')=.]") </code></pre> Returns: <code>123</code>, <code>456</code>, <code>xyz</code> and <code>123</code> (lowercase nodes) NOTE: Unfortunately <code>XPATH 1.0</code> does not support <code>upper-case()</code> nor <code>lower-case()</code> so the above is a workaround. Add special characters if need be. <hr> 7) Elements that (not) contain any number: <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[translate(.,'1234567890','')!=.]") </code></pre> Returns: <code>123</code>, <code>456</code>, <code>XY-1A</code>, <code>ZY-2F</code>, <code>XY-3F</code>, <code>XY-4f</code> and <code>123</code> (contain any digit) <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[translate(.,'1234567890','')=.]") </code></pre> Returns: <code>ABC</code>, <code>DEF</code> and <code>xyz</code> (don't contain any digit) <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[translate(.,'1234567890','')!=. and .*0!=0]") </code></pre> Returns: <code>XY-1A</code>, <code>ZY-2F</code>, <code>XY-3F</code> and <code>XY-4f</code> (holding digits but not a a number on it's own) <hr> 8) Unique elements or duplicates: <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[preceding::*=.]") </code></pre> Returns: <code>123</code> (duplicate nodes) <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[not(preceding::*=.)]") </code></pre> Returns: <code>ABC</code>, <code>123</code>, <code>DEF</code>, <code>456</code>, <code>XY-1A</code>, <code>ZY-2F</code>, <code>XY-3F</code>, <code>XY-4f</code> and <code>xyz</code> (unique nodes) <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[not(following::*=. or preceding::*=.)]") </code></pre> Returns: <code>ABC</code>, <code>DEF</code>, <code>456</code>, <code>XY-1A</code>, <code>ZY-2F</code>, <code>XY-3F</code> and <code>XY-4f</code> (nodes that have no similar sibling) <hr> 9) Elements of certain length: <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[string-length()=5]") </code></pre> Returns: <code>XY-1A</code>, <code>ZY-2F</code>, <code>XY-3F</code> and <code>XY-4f</code> (5 characters long) <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[string-length()<4]") </code></pre> Returns: <code>ABC</code>, <code>123</code>, <code>DEF</code>, <code>456</code>, <code>xyz</code> and <code>123</code> (shorter than 4 characters) <hr> 10) Elements based on preceding/following: <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[preceding::*[1]='456']") </code></pre> Returns: <code>XY-1A</code> (previous node equals '456') <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[starts-with(preceding::*[1],'XY')]") </code></pre> Returns: <code>ZY-2F</code>, <code>XY-4f</code>, and <code>xyz</code> (previous node starts with 'XY') <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[following::*[1]='123']") </code></pre> Returns: <code>ABC</code>, and <code>xyz</code> (following node equals '123') <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[contains(following::*[1],'1')]") </code></pre> Returns: <code>ABC</code>, <code>456</code>, and <code>xyz</code> (following node contains '1') <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[preceding::*='ABC' and following::*='XY-3F']") </code></pre> Or: <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[.='ABC']/following::s[following::s='XY-3F']") </code></pre> Returns: <code>123</code>, <code>DEF</code>, <code>456</code>, <code>XY-1A</code> and <code>ZY-2F</code> (everything between 'ABC' and 'XY-3F') <hr> 11) Elements based on sub-strings: <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[substring-after(., '-') = '3F']") </code></pre> Returns: <code>XY-3F</code> (nodes ending with '3F' after hyphen) <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[contains(substring-after(., '-') , 'F')]") </code></pre> Returns: <code>ZY-2F</code> and <code>XY-3F</code> (nodes containing 'F' after hyphen) <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[substring-before(., '-') = 'ZY']") </code></pre> Returns: <code>ZY-2F</code> (nodes starting with 'ZY' before hyphen) <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[contains(substring-before(., '-'), 'Y')]") </code></pre> Returns: <code>XY-1A</code>, <code>ZY-2F</code>, <code>XY-3F</code> and <code>XY-4f</code> (nodes containing 'Y' before hyphen) <hr> 12) Elements based on concatenation: <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[concat(., '|', following::*[1])='ZY-2F|XY-3F']") </code></pre> Returns: <code>ZY-2F</code> (nodes when concatenated with '|' and following sibling equals 'ZY-2F|XY-3F') <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[contains(concat(., preceding::*[2]), 'FA')]") </code></pre> Returns: <code>DEF</code> (nodes when concatenated with sibling two indices to the left contains 'FA') <hr> 13) Empty vs. Non-empty: <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[count(node())>0]") </code></pre> Or: <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[node()]") </code></pre> Returns: <code>ABC</code>, <code>123</code>, <code>DEF</code>, <code>456</code>, <code>XY-1A</code>, <code>ZY-2F</code>, <code>XY-3F</code>, <code>XY-4f</code>, <code>xyz</code> and <code>123</code> (all nodes that are not empty) <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[count(node())=0]") </code></pre> Or: <pre class="prettyprint"><code>=FILTERXML(<XML>,"//s[not(node())]") </code></pre> Returns: None (all nodes that are empty) <hr> Now obviously the above is a demonstration of possibilities with <code>XPATH 1.0</code> functions and you can get a whole range of combinations of the above and more! I tried to cover most commonly used string functions. If you are missing any please feel free to comment. Whereas the question is quite broad on itself, I was hoping to give some general direction on how to use <code>FILTERXML</code> for the queries at hand. The formula returns an array of nodes to be used in any other way. A lot of the times I would use it in <code>TEXTJOIN()</code> or <code>INDEX()</code>. But I guess other options would be new DA-functions to spill results. Be alert that while parsing a string through <code>FILTERXML()</code>, the ampersand character (&) and the left angle bracket (<) must not appear in their literal form. They will respectively need to be substituted with either <code>&amp;</code> or <code>&lt;</code>. Another option would be to use their numeric ISO/IEC 10646 character code being <code>&#38;</code> or <code>&#60;</code> respectively. After parsing, the function will return these characters back to you in their literal form. Needless to say that splitting a string by the semi-colon therefor just became tricky.

Excel - Extract substring(s) from string using FILTERXML

Tags:

arrays

xml

excel

xpath

excel-formula

_Background

Lately I've been trying to get more familiar with the concept of changing a delimited string into an XML to parse with Excel's FILTERXML and retrieve those substrings that are of interest. Please note that this function came available from Excel 2013 and is not available on Excel for Mac nor Excel Online.

With a delimited string, I meant anything from a normal sentence using spaces as delimiters or any other combination of characters that could be used to define substrings within a string. For example let's imagine the following:

ABC|123|DEF|456|XY-1A|ZY-2F|XY-3F|XY-4f|xyz|123

_Question

So, where a lot of people know how to get the nth element (e.g.: =TRIM(MID(SUBSTITUTE(A1,"|",REPT(" ",LEN(A1))),3*LEN(A1)+1,LEN(A1))) to retrieve 456). Or other combinationes with LEN(), MID(), FIND() and all those constructs, how do we use FILTERXML to use more specific criteria to extract substrings of concern and clean up the full string? For example, how to retrieve:

elements by position
numeric or non-numeric elements
elements that contain a substring on their own
elements that start or end with a substring
elements that are upper- or lowercase
elements holding numbers
unique values
...

288

asked May 16 '20 13:05

JvdV

1 Answers

Excel's FILTERXML uses XPATH 1.0 which unfortunately means it is not as diverse as we would maybe want it to be. Also, Excel seems to not allow returning reworked node values and exclusively allows you to select nodes in order of appearance. However there is a fair share of functions we can still utilize. More information about that can be found here.

The function takes two parameters: =FILTERXML(<A string in valid XML format>,<A string in valid XPATH format>)

Let's say cell A1 holds the string: ABC|123|DEF|456|XY-1A|ZY-2F|XY-3F|XY-4f|xyz|123. To create a valid XML string we use SUBSTITUTE to change the delimiter to valid end- and start-tag constructs. So to get a valid XML construct for the given example we could do:

"<t><s>"&SUBSTITUTE(A1,"|","</s><s>")&"</s></t>"

For readability reasons I'll refer to the above construct with the word <XML> as a placeholder. Below you'll find different usefull XPATH functions in a valid construct to filter nodes:

_{1) All Elements:}

=FILTERXML(<XML>,"//s")

_{Returns: ABC, 123, DEF, 456, XY-1A, ZY-2F, XY-3F, XY-4f, xyz and 123 (all nodes)}

_{2) Elements by position:}

=FILTERXML(<XML>,"//s[position()=4]")

Or:

=FILTERXML(<XML>,"//s[4]")

_{Returns: 456 (node on index 4)}

=FILTERXML(<XML>,"//s[position()<4]")

_{Returns: ABC, 123 and DEF (nodes on index < 4)}

=FILTERXML(<XML>,"//s[position()=2 or position()>5]")

_{Returns: 123, ZY-2F, XY-3F, XY-4f, xyz and 123 (nodes on index 2 or > 5)}

=FILTERXML(<XML>,"//s[last()]")

_{Returns: 123 (node on last index)}

=FILTERXML(<XML>,"//s[position() mod 2 = 1]")

_{Returns: ABC, DEF, XY-1A, XY-3F and xyz (odd nodes)}

=FILTERXML(<XML>,"//s[position() mod 2 = 0]")

_{Returns: 123, 456, ZF-2F, XY-4f and 123 (even nodes)}

_{3) (Non) numeric elements:}

=FILTERXML(<XML>,"//s[number()=.]")

Or:

=FILTERXML(<XML>,"//s[.*0=0]")

_{Returns: 123, 456, and 123 (numeric nodes)}

=FILTERXML(<XML>,"//s[not(number()=.)]")

Or:

=FILTERXML(<XML>,"//s[.*0!=0)]")

_{Returns: ABC, DEF, XY-1A, ZY-2F, XY-3F, XY-4f and xyz (non-numeric nodes)}

_{4) Elements that (not) contain:}

=FILTERXML(<XML>,"//s[contains(., 'Y')]")

_{Returns: XY-1A, ZY-2F, XY-3F and XY-4f (containing 'Y', notice XPATH is case sensitive, exclusing xyz)}

=FILTERXML(<XML>,"//s[not(contains(., 'Y'))]")

_{Returns: ABC, 123, DEF, 456, xyz and 123 (not containing 'Y', notice XPATH is case sensitive, including xyz)}

_{5) Elements that (not) start or/and end with:}

=FILTERXML(<XML>,"//s[starts-with(., 'XY')]")

_{Returns: XY-1A, XY-3F and XY-4f (starting with 'XY')}

=FILTERXML(<XML>,"//s[not(starts-with(., 'XY'))]")

_{Returns: ABC, 123, DEF, 456, ZY-2F, xyz and 123 (don't start with 'XY')}

=FILTERXML(<XML>,"//s[substring(., string-length(.) - string-length('F') +1) = 'F']")

_{Returns: DEF, ZY-2F and XY-3F (end with 'F', notice XPATH 1.0 does not support ends-with)}

=FILTERXML(<XML>,"//s[not(substring(., string-length(.) - string-length('F') +1) = 'F')]")

_{Returns: ABC, 123, 456, XY-1A, XY-4f, xyz and 123 (don't end with 'F')}

=FILTERXML(<XML>,"//s[starts-with(., 'X') and substring(., string-length(.) - string-length('A') +1) = 'A']")

_{Returns: XY-1A (start with 'X' and end with 'A')}

_{6) Elements that are upper- or lowercase:}

=FILTERXML(<XML>,"//s[translate(.,'abcdefghijklmnopqrstuvwxyz','ABCDEFGHIJKLMNOPQRSTUVWXYZ')=.]")

_{Returns: ABC, 123, DEF, 456, XY-1A, ZY-2F, XY-3F and 123 (uppercase nodes)}

=FILTERXML(<XML>,"//s[translate(.,'ABCDEFGHIJKLMNOPQRSTUVWXYZ','abcdefghijklmnopqrstuvwxyz')=.]")

_{Returns: 123, 456, xyz and 123 (lowercase nodes)}

_{NOTE: Unfortunately XPATH 1.0 does not support upper-case() nor lower-case() so the above is a workaround. Add special characters if need be.}

_{7) Elements that (not) contain any number:}

=FILTERXML(<XML>,"//s[translate(.,'1234567890','')!=.]")

_{Returns: 123, 456, XY-1A, ZY-2F, XY-3F, XY-4f and 123 (contain any digit)}

=FILTERXML(<XML>,"//s[translate(.,'1234567890','')=.]")

_{Returns: ABC, DEF and xyz (don't contain any digit)}

=FILTERXML(<XML>,"//s[translate(.,'1234567890','')!=. and .*0!=0]")

_{Returns: XY-1A, ZY-2F, XY-3F and XY-4f (holding digits but not a a number on it's own)}

_{8) Unique elements or duplicates:}

=FILTERXML(<XML>,"//s[preceding::*=.]")

_{Returns: 123 (duplicate nodes)}

=FILTERXML(<XML>,"//s[not(preceding::*=.)]")

_{Returns: ABC, 123, DEF, 456, XY-1A, ZY-2F, XY-3F, XY-4f and xyz (unique nodes)}

=FILTERXML(<XML>,"//s[not(following::*=. or preceding::*=.)]")

_{Returns: ABC, DEF, 456, XY-1A, ZY-2F, XY-3F and XY-4f (nodes that have no similar sibling)}

_{9) Elements of certain length:}

=FILTERXML(<XML>,"//s[string-length()=5]")

_{Returns: XY-1A, ZY-2F, XY-3F and XY-4f (5 characters long)}

=FILTERXML(<XML>,"//s[string-length()<4]")

_{Returns: ABC, 123, DEF, 456, xyz and 123 (shorter than 4 characters)}

_{10) Elements based on preceding/following:}

=FILTERXML(<XML>,"//s[preceding::*[1]='456']")

_{Returns: XY-1A (previous node equals '456')}

=FILTERXML(<XML>,"//s[starts-with(preceding::*[1],'XY')]")

_{Returns: ZY-2F, XY-4f, and xyz (previous node starts with 'XY')}

=FILTERXML(<XML>,"//s[following::*[1]='123']")

_{Returns: ABC, and xyz (following node equals '123')}

=FILTERXML(<XML>,"//s[contains(following::*[1],'1')]")

_{Returns: ABC, 456, and xyz (following node contains '1')}

=FILTERXML(<XML>,"//s[preceding::*='ABC' and following::*='XY-3F']")

Or:

=FILTERXML(<XML>,"//s[.='ABC']/following::s[following::s='XY-3F']")

_{Returns: 123, DEF, 456, XY-1A and ZY-2F (everything between 'ABC' and 'XY-3F')}

_{11) Elements based on sub-strings:}

=FILTERXML(<XML>,"//s[substring-after(., '-') = '3F']")

_{Returns: XY-3F (nodes ending with '3F' after hyphen)}

=FILTERXML(<XML>,"//s[contains(substring-after(., '-') , 'F')]")

_{Returns: ZY-2F and XY-3F (nodes containing 'F' after hyphen)}

=FILTERXML(<XML>,"//s[substring-before(., '-') = 'ZY']")

_{Returns: ZY-2F (nodes starting with 'ZY' before hyphen)}

=FILTERXML(<XML>,"//s[contains(substring-before(., '-'), 'Y')]")

_{Returns: XY-1A, ZY-2F, XY-3F and XY-4f (nodes containing 'Y' before hyphen)}

_{12) Elements based on concatenation:}

=FILTERXML(<XML>,"//s[concat(., '|', following::*[1])='ZY-2F|XY-3F']")

_{Returns: ZY-2F (nodes when concatenated with '|' and following sibling equals 'ZY-2F|XY-3F')}

=FILTERXML(<XML>,"//s[contains(concat(., preceding::*[2]), 'FA')]")

_{Returns: DEF (nodes when concatenated with sibling two indices to the left contains 'FA')}

_{13) Empty vs. Non-empty:}

=FILTERXML(<XML>,"//s[count(node())>0]")

Or:

=FILTERXML(<XML>,"//s[node()]")

_{Returns: ABC, 123, DEF, 456, XY-1A, ZY-2F, XY-3F, XY-4f, xyz and 123 (all nodes that are not empty)}

=FILTERXML(<XML>,"//s[count(node())=0]")

Or:

=FILTERXML(<XML>,"//s[not(node())]")

_{Returns: None (all nodes that are empty)}

Now obviously the above is a demonstration of possibilities with XPATH 1.0 functions and you can get a whole range of combinations of the above and more! I tried to cover most commonly used string functions. If you are missing any please feel free to comment.

Whereas the question is quite broad on itself, I was hoping to give some general direction on how to use FILTERXML for the queries at hand. The formula returns an array of nodes to be used in any other way. A lot of the times I would use it in TEXTJOIN() or INDEX(). But I guess other options would be new DA-functions to spill results.

Be alert that while parsing a string through FILTERXML(), the ampersand character (&) and the left angle bracket (<) must not appear in their literal form. They will respectively need to be substituted with either & or <. Another option would be to use their numeric ISO/IEC 10646 character code being & or < respectively. After parsing, the function will return these characters back to you in their literal form. Needless to say that splitting a string by the semi-colon therefor just became tricky.

177

answered Sep 28 '22 05:09

JvdV

Related questions
                            
                                Print numpy array without ellipsis
                            
                                Why am I getting an AssertionError when assigning Arrays.asList() to var directly?
                            
                                Android: Populating a listview with array items
                            
                                Wildcard equivalent in C# generics
                            
                                Can you force either a scalar or array ref to be an array in Perl?
                            
                                How is numpy's fancy indexing implemented?
                            
                                Why it is impossible to create an array of references in c++?
                            
                                Initialize an array of structs inside a nested struct in golang
                            
                                Is it possible to use array of bit fields?
                            
                                javascript surprising array comparison
                            
                                Undefined values in Array(len) initializer
                            
                                Convert double to int array
                            
                                TypeError: only integer arrays with one element can be converted to an index 3
                            
                                Finding length of char array
                            
                                Is this the best way to add an extra dimension to a numpy array in one line of code?
                            
                                List an Array of Strings in alphabetical order
                            
                                Iterate pairwise through a ruby array [duplicate]
                            
                                loop over array in gnuplot
                            
                                How to efficiently search in an ordered matrix? [duplicate]
                            
                                Why have arrays in Go?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Excel - Extract substring(s) from string using FILTERXML

Tags:

arrays

xml

excel

xpath

excel-formula

JvdV

People also ask

1 Answers

JvdV

Recent Activity

Donate For Us