I'm not sure if Youtube is the only website with this technology, but content identification in YT (Content ID) is basically a technology to automatically identify and remove copyright infringements. You can read more about it here:
http://www.youtube.com/t/contentid
Well when one of my videos (containing a particular music track) got tagged and removed for copyright infringement, I thought it [the content-ID sytstem] was probably dumb. So I did some experiments: none of which fooled the filter~
On the other hand, I don't know any material being falsely matched as copyrighted. A piano version of a song, for example, would not falsely trigger the censor.
I'm not ranting about my videos being removed. I'm just surprised how effective the content censor is. I'm wondering how the algorithm correctly identifies the song as infringing copyright even after all my efforts to circumvent it. Any attempts to directly match would have been defeated immedately, any algorithms involving note patterns would likely be fooled by the beeps and the pitch shifting.
Well this is more of my curiosity than an urgent question..
Content ID is YouTube's automated, scalable system that enables copyright owners to identify YouTube videos that include content they own. YouTube only grants Content ID to copyright owners who meet specific criteria.
How Content ID Works. Content ID works by scanning videos uploaded to YouTube against a database of files submitted to the platform by copyright owners. When the system identifies a match between uploaded content on YouTube and a copyright-protected work, the video receives a Content ID claim.
Some copyright owners use Content ID, YouTube's automated content identification system, to easily identify and manage their copyright-protected content on YouTube. Videos uploaded to YouTube are scanned against a database of audio and visual content that's been submitted to YouTube by copyright owners.
According to YouTube, less than 1% of all Content ID claims are disputed but the lack of a dispute doesn't mean that the claim is correct. It may just mean the person chose not to fight it. All in all, YouTube says that, with audio matches at least, the system has over 99.7% precision.
Pedro Moreno and others at Google/Youtube work on it. They use finite-state transducers to recognize sequences of music phone units, similar to phonemes in automatic speech recognition.
Check out this article:
If you change the speed or pitch throughout the whole song I'm surprised that these algorithms still recognize the song. But maybe they normalize the pitch and speed (using the time between beats) to be able to recognize covered versions as well, not just the original ones. But it's not surprising that it can ignore the beeps you added, since there is enough similarity in your audio stream otherwise.
(Actually the finite-state-based algorithm would be awesome to apply to my iTunes library, to tag the files correctly. Because services like MusicBrainz rely on more or less exact hash matches of your audio and the database entry, whereas the transducer method seems to be more difference-tolerant in recognizing the files.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With