Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write regex for bullet space digit and dot

Tags:

java

regex

I am using regex for my sentence contains bullet space digit and dot.

• 1. This is sample Application
• 2. This is Sample java program

regex:

•\\s\\d\\.\\s[A-z]

Required output:
This is sample Application.
This is Sample java program.

its not working.Please Suggest me how to do this.

like image 979
user2664353 Avatar asked Aug 16 '13 05:08

user2664353


People also ask

How do you specify a dot in regex?

within a Java string (because \ itself has special meaning within Java strings.) You can then use \. \. or \. {2} to match exactly 2 dots.

What does dot asterisk mean in regex?

The dot represents an arbitrary character, and the asterisk says that the character before can be repeated an arbitrary number of times (or not at all).

What is Dot Plus in regex?

The next token is the dot, which matches any character except newlines. The dot is repeated by the plus. The plus is greedy. Therefore, the engine will repeat the dot as many times as it can. The dot matches E, so the regex continues to try to match the dot with the next character.

How do you find a space in a string in regex?

Spaces can be found simply by putting a space character in your regex. Whitespace can be found with \s . If you want to find whitespace between words, use the \b word boundary marker.


2 Answers

To match the bullet character you will need to use the unicode escape sequence. However Unicode defines several bullet styles, so it's probably best to allow for all of them:

[\u2022,\u2023,\u25E6,\u2043,\u2219]\s\d\.\s[A-z]

This should match the following bullet styles:

  • Bullet (•)
  • Triangular Bullet (‣)
  • White Bullet (◦)
  • Hyphen Bullet (⁃)
  • Bullet Operator (∙)

Reference: https://en.wikipedia.org/wiki/%E2%80%A2

like image 74
maxf130 Avatar answered Sep 25 '22 09:09

maxf130


Instead of using the actual 'bullet,' use the unicode equivalent:

\u2022\s\d\.\s[A-z]

For more info see Unicode Character 'BULLET' (U+2022) and Regex Tutorial - Unicode Characters and Properties

EDIT: To split the lines (assuming each line is a separate string) try this out:

String firstString = "• 1. This is sample Application";
System.out.println(firstString.split("\\u2022\\s\\d\\.\\s")[1]);

This works because String.split cuts your string into an array wherever there's a match. The [1] addresses the second item in that array, being the second half of the split.

like image 39
isaacparrot Avatar answered Sep 21 '22 09:09

isaacparrot