Youtube complete Java Regex

Tags:

I need to parse several pages to get all of their Youtube IDs.

I found many regular expressions on the web, but : the Java ones are not complete (they either give me garbage in addition to the IDs, or they miss some IDs).

The one that I found that seems to be complete is hosted here. But it is written in JavaScript and PHP. Unfortunately I couldn't translate them into JAVA.

Can somebody help me rewrite this PHP regex or the following JavaScript one in Java?

'~
    https?://         # Required scheme. Either http or https.
    (?:[0-9A-Z-]+\.)? # Optional subdomain.
    (?:               # Group host alternatives.
      youtu\.be/      # Either youtu.be,
    | youtube\.com    # or youtube.com followed by
      \S*             # Allow anything up to VIDEO_ID,
      [^\w\-\s]       # but char before ID is non-ID char.
    )                 # End host alternatives.
    ([\w\-]{11})      # $1: VIDEO_ID is exactly 11 chars.
    (?=[^\w\-]|$)     # Assert next char is non-ID or EOS.
    (?!               # Assert URL is not pre-linked.
      [?=&+%\w]*      # Allow URL (query) remainder.
      (?:             # Group pre-linked alternatives.
        [\'"][^<>]*>  # Either inside a start tag,
      | </a>          # or inside <a> element text contents.
      )               # End recognized pre-linked alts.
    )                 # End negative lookahead assertion.
    [?=&+%\w]*        # Consume any URL (query) remainder.
    ~ix'

/https?:\/\/(?:[0-9A-Z-]+\.)?(?:youtu\.be\/|youtube\.com\S*[^\w\-\s])([\w\-]{11})(?=[^\w\-]|$)(?![?=&+%\w]*(?:['"][^<>]*>|<\/a>))[?=&+%\w]*/ig;

876

asked Oct 25 '11 19:10

mossaab

2 Answers

First of all you need to insert and extra backslash \ foreach backslash in the old regex, else java thinks you escapes some other special characters in the string, which you are not doing.

https?:\\/\\/(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*

Next when you compile your pattern you need to add the CASE_INSENSITIVE flag. Here's an example:

String pattern = "https?:\\/\\/(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*";

Pattern compiledPattern = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = compiledPattern.matcher(link);
while(matcher.find()) {
    System.out.println(matcher.group());
}

123

answered Oct 09 '22 17:10

Marcus

Marcus above has a good regex, but i found that it doesn't recognize youtube links that have "www" but not "http(s)" in them for example www.youtube....

i have an update:

^(?:https?:\\/\\/)?(?:[0-9A-Z-]+\\.)?(?:youtu\\.be\\/|youtube\\.com\\S*[^\\w\\-\\s])([\\w\\-]{11})(?=[^\\w\\-]|$)(?![?=&+%\\w]*(?:['\"][^<>]*>|<\\/a>))[?=&+%\\w]*

it's the same except for the start

answered Oct 09 '22 18:10

Blagoj Atanasovski

Related questions
                            
                                Is there something wrong with Swing's MVC implementation for JList?
                            
                                How to integrate Google Analytics into GWT using the asynchronous script
                            
                                How to scroll to last row in a JTable
                            
                                Why does setBackground to JButton does not work?
                            
                                Eclipse autocompletion - how does it know about generics when only binary jar is available?
                            
                                Java: column number and line number of cursor's current position
                            
                                Java Mysterious EOF exception with readObject
                            
                                Arrays in Java and how they are stored in memory
                            
                                Apache POI evaluate formula
                            
                                How to resize JTable column to string length?
                            
                                How to view HTML coverage report using Cobertura Maven plugin?
                            
                                how to get the size of an image in java
                            
                                Java seems to ignore -Xms and -Xmx options
                            
                                Handle external windows using java
                            
                                ArrayList<T> vs ArrayList<?>
                            
                                Howto delete all object pointer in java
                            
                                Does "return" stop the execution of a method?
                            
                                How and why can a Semaphore give out more permits than it was initialized with?
                            
                                GWT: Putting raw HTML inside a Label
                            
                                ArrayBlockingQueue and add vs put vs capacity

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Youtube complete Java Regex

Tags:

java

regex

youtube

mossaab

People also ask

2 Answers

Marcus

Blagoj Atanasovski

Recent Activity

Donate For Us