I'm using the Winforms
WebBrowser
control to collect the links of video clips from the site linked below.
LINK
But, when I scroll element by element, I cannot find the <video>
tag.
void webBrowser_DocumentCompleted_2(object sender, WebBrowserDocumentCompletedEventArgs e)
{
try
{
HtmlElementCollection pTags = browser.Document.GetElementsByTagName("video");
int i = 1;
foreach (HtmlElement link in links)
{
if (link.Children[0].GetAttribute("className") == "vjs-poster")
{
try
{
i++;
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
}
} // Added by edit
}
Soon after using the
HtmlElementCollection pTags = browser.Document.GetElementsByTagName("video");
I already return 0
Do I need to call any ajax?
Once we move the driver focus inside the frame, we can access the elements inside the frame by the xpath locator with the help of the driver. findElement(By. xpath(<xpath value>)) method.
In short, to check if a page is in an iframe, you need to compare the object's location with the window object's parent location. If they are equal, then the page is not in an iframe; otherwise, a page is in an iframe.
They were almost always a bad approach to design. Thankfully, the <frame> element has been deprecated in HTML5, but the <iframe> , or “inline frame” is still available.
iFrames are an HTML tag and have been around for absolutely ages having been introduced back in 1997. Despite their age, they are still commonly used and are supported by all modern browsers.
The Web page you linked contains IFrames.
An IFrame
contains its own HtmlDocument. As of now, you're parsing just the main Document container.
Thus, you need to parse the HtmlElements
TAGs of some other Frame
.
The Web Page Frames list is referenced by the WebBrowser.Document.Window.Frames property, which returns an HtmlWindowCollection.
Each HtmlWindow in the collection contains it own HtmlDocument
object.
Instead of parsing the Document
object property returned by a WebBrowser
, we, most of the time, need to parse each HtmlWindow.Document
in the Frames
collection; unless, of course we already know that the required Elements are part of the main Document or another known Frame
.
An example (related to the current task):
Note:
Remembering that a Web Page may be composed by multiple Documents contained in Frames/IFrames, we won't be surprised if the event is raised multiple times with a ReadyState = WebBrowserReadyState.Complete
.
Each Frame's Document
will raise the event when the WebBrowser
is done loading it.
HtmlDocument
of each Frame in the Document.Window.Frames
collection, using the Frame.Document.Body.GetElementsByTagName() method.HtmlElements
Attibute
using the HtmlElement.GetAttribute method.Note:
Since the DocumentCompleted
event is raised multiple times, we need to verify that an HtmlElement
Attribute value is not stored multiple times, too.
Here, I'm using a support custom Class that holds all the collected values along with the HashCode of each reference Link (here, relying on the default implementation of GetHasCode()
).
Each time a Document is parsed, we check whether a value has already been stored, comparing its Hash.
Note:
While parsing the HtmlWindowCollection
, it's inevitable to raise some specific Exceptions:
There's nothing we can do to avoid this: the Elements are not null
, they simply throw these exceptions when we try to access any of their properties.
Here, I'm just catching and ignoring these specific Exceptions: we know we will eventually get them, we cannot avoid it, move on.
public class MovieLink
{
public MovieLink() { }
public int Hash { get; set; }
public string VideoLink { get; set; }
public string ImageLink { get; set; }
}
List<MovieLink> moviesLinks = new List<MovieLink>();
private void Browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
var browser = sender as WebBrowser;
if (browser.ReadyState != WebBrowserReadyState.Complete) return;
var documentFrames = browser.Document.Window.Frames;
foreach (HtmlWindow Frame in documentFrames) {
try {
var videoElement = Frame.Document.Body
.GetElementsByTagName("VIDEO").OfType<HtmlElement>().FirstOrDefault();
if (videoElement != null) {
string videoLink = videoElement.Children[0].GetAttribute("src");
int hash = videoLink.GetHashCode();
if (moviesLinks.Any(m => m.Hash == hash)) {
// Done parsing this URL: remove handler or whatever
// else is planned to move to the next site/page
return;
}
string sourceImage = videoElement.GetAttribute("poster");
moviesLinks.Add(new MovieLink() {
Hash = hash, VideoLink = videoLink, ImageLink = sourceImage
});
}
}
catch (UnauthorizedAccessException) { } // Cannot be avoided: ignore
catch (InvalidOperationException) { } // Cannot be avoided: ignore
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With