I need to find all places in a bunch of HTML files, that lie in following structure (CSS):
div.a ul.b
or XPath:
//div[@class="a"]//div[@class="b"]
grep
doesn't help me here. Is there a command-line tool that returns all files (and optionally all places therein), that match this criterium? I.e., that returns file names, if the file matches a certain HTML or XML structure.
Css has better performance and speed than xpath. Xpath allows identification with the help of visible text appearing on screen with the help of text() function. Css does not have this feature. Customized css can be created directly with the help of attributes id and class.
What is a CSS Selector? Essentially, the CSS Selector combines an element selector and a selector value that can identify particular elements on a web page. Like XPath, CSS selector can be used to locate web elements without ID, class, or Name.
The XPath is the language used to select elements in an HTML page. XPath can be used to locate any element on a page based on its tag name, ID, CSS class, and so on. There are two types of XPath in Selenium.
In automation of web applications locators plays very major role. Xpath is the one of the most used locator strategy in Selenium automation. We can replace most of the xpaths with css selectors in WebDriver automation. CssSelectors will work fine with IE without any problem.
Try this:
aptitude install html-xml-utils
brew install html-xml-utils
hxnormalize -l 240 -x filename.html | hxselect -s '\n' -c "label.black"
Where "label.black"
is the CSS selector that uniquely identifies the name of the HTML element. Write a helper script named cssgrep
:
#!/bin/bash
# Ignore errors, write the results to standard output.
hxnormalize -l 240 -x $1 2>/dev/null | hxselect -s '\n' -c "$2"
You can then run:
cssgrep filename.html "label.black"
This will generate the content for all HTML label
elements of the class black
.
The -l 240
argument is important to avoid parsing line-breaks in the output. For example if <label class="black">Text to \nextract</label>
is the input, then -l 240
will reformat the HTML to <label class="black">Text to extract</label>
, inserting newlines at column 240, which simplifies parsing. Extending out to 1024 or beyond is also possible.
See also:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With