I am trying to make a regex to get all the possible consecutive 4 digit numbers from a 10 digit number. Like
num = "2345678901";
Output :
2345
, 3456
, 4567
, 5678
, 6789
, 7890
, 8901
These simple regex are not working:
[\d]{4}
(\d\d\d\d)
You need to use (?=(\d{4}))
regex to match overlapping matches.
See the regex demo
The regexes you are using are all consuming the 4 digit chunks of text, and thus the overlapping values are not matched. With (?=...)
positive lookahead, you can test each position inside the input string, and capture 4 digit chunks from those positions, without consuming the characters (i.e. without moving the regex engine pointer to the location after these 4 digit chunks).
C# demo:
var data = "2345678901";
var res = Regex.Matches(data, @"(?=(\d{4}))")
.Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
Console.WriteLine(string.Join("\n", res));
Do you absolutely need to use Regex? The same operation can be achieved much more quickly using a simple loop.
private IEnumerable<string> getnums(string num)
{
for (int i = 0; i < num.Length - 3; i++)
{
yield return num.Substring(i, 4);
}
}
private IEnumerable<string> DoIt(string num)
{
var res = Regex.Matches(num, @"(?=(\d{4}))")
.Cast<Match>()
.Select(p => p.Groups[1].Value)
.ToList();
return (IEnumerable<string>)res;
}
On average the simple loop takes about half the time of the RegEx version.
static void Main(string[] args)
{
var num = "2345678901";
Stopwatch timer = new Stopwatch();
timer.Start();
foreach (var number in getnums(num))
{
// Yum yum numbers
}
timer.Stop();
Console.WriteLine(timer.Elapsed.Ticks);
timer.Reset();
timer.Start();
foreach (var number in DoIt(num))
{
// Yum yum numbers
}
timer.Stop();
Console.WriteLine(timer.Elapsed.Ticks);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With