Tumblr and other blogging websites allows people to post embeded codes of videos from youtube and all video networks.
but how they filter only the flash object code and remove any other html or scripts? and even they have an automated code that informes you this is not a valid video code.
Is this done using REGEX expressions? And Is there a PHP class to do that?
Thanks
To prevent XSS attacks, your application must validate all the input data, make sure that only the allowlisted data is allowed, and ensure that all variable output in a page is encoded before it is returned to the user.
A web application firewall (WAF) can be a powerful tool for protecting against XSS attacks. WAFs can filter bots and other malicious activity that may indicate an attack. Attacks can then be blocked before any script is executed.
Cross-site scripting (XSS) is a code injection security attack targeting web applications that delivers malicious, client-side scripts to a user's web browser for execution.
Now let's look at how you can prevent XSS without changing the whole source code. The X-XSS-protection header is designed to prevent XSS attacks the filter is usually present in all kind of modern browser but you need to enforce it to use it. It is supported by Internet Explorer 8+, Chrome, and Firefox etc.
Generally speaking, using regex is not a good way to deal with HTML : HTML is not regular enough for regular expressions : there are too many variations permitted in the standards... And browsers even accept HTML that's not valid !
In PHP, as your question is tagged as php
, a great solution that exists to filter user input is the HTMLPurifier tool.
A couple of interesting things are :
Basically, the idea is to only keep what you specify (white-list), instead of trying to remove bad stuff using a black-list (which will never be quite complete).
And if you only specify a list of tags and attributes that can do no harm, only those will be kept -- and the risks of injections are lowered a lot.
Quoting HTMLPurifier's home page :
HTML Purifier is a standards-compliant HTML filter library written in PHP.
HTML Purifier will not only remove all malicious code (better known as XSS) with a thoroughly audited, secure yet permissive whitelist, it will also make sure your documents are standards compliant, something only achievable with a comprehensive knowledge of W3C's specifications.
Yes, another great thing is that the code you get as output is valid.
Of course, this will only allow you to clean / filter / purify the HTML input ; it will not allow you to validate that the URL used by the user is both :
About the second point, there's not much one can do about it : the best solution will be to either :
Basically, to check the content itself of the video, there is not much choice but have a human being say "ok" or "not ok".
About the first point, though, there's hope : some services that host content have APIs that you might want / be able to use.
For instance, Youtube provides an API -- see Developer's Guide: PHP.
In your case, the Retrieving a specific video entry section looks promising : if you send an HTTP request to an URL that looks like this :
http://gdata.youtube.com/feeds/api/videos/videoID
(Replacing "videoID" by the ID of the video, of course)
You'll get some ATOM feed if the video is valid ; and "Invalid id" if it's not
This might help you validate at least some URL to contents -- even if you'll have to develop some specific code for each possible content-hosting service that your users like...
Now, to extract the identifier of the video from your HTML string... If you're thinking about using regex, you are wrong ;-)
The best solution to extract a portion of data from an HTML string is generally to :
DOMDocument::loadHTML
is generally pretty helpful, hereDOMDocument::getElementsByTagName
, if you need to iterate over all elements that have a specific tag name ; might be great to iterate over all <object>
or <embed>
tags, for instanceDOMXPath
class and its DOMXPath::query
method.And using DOM will also allow you to modify the HTML document using a standard API -- which might help, in case you want to add some message next to the video, or any other thing like that.
Take a look at htmlpurifier to start. http://htmlpurifier.org/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With