Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I split between numbers and characters with regex?

Tags:

regex

I have a string containing on weekdays and opening hours, how do I split these into lines using a regex expression? An example of a string is:

Mån - Tor6:30 - 22:00Fre6:30 - 20:00Lör9:00 - 18:00Sön10:00 - 19:00

I want to split between a lower letter and a number, and between a number and a capital letter

Mån - Tor  
6:30 - 22:00  
Fre  
6:30 - 20:00  
Lör  
9:00 - 18:00  
Sön  
10:00 - 19:00 

Thanks in advance!

like image 660
Magnus Avatar asked Dec 21 '22 20:12

Magnus


2 Answers

Split on

(?<=\d)(?=\p{L})|(?<=\p{L})(?=\d)

For example, in C#:

splitArray = Regex.Split(subjectString, @"(?<=\d)(?=\p{L})|(?<=\p{L})(?=\d)");

or in PHP:

$result = preg_split('/(?<=\d)(?=\p{L})|(?<=\p{L})(?=\d)/u', $subject);

or in Java:

String[] splitArray = subjectString.split("(?<=\\d)(?=\\p{L})|(?<=\\p{L})(?=\\d)");

or in Perl:

@result = split(m/(?<=\d)(?=\p{L})|(?<=\p{L})(?=\d)/, $subject);
like image 78
Tim Pietzcker Avatar answered Jan 12 '23 13:01

Tim Pietzcker


If and only if a number is a code point with the \pN property, than a nonnumber is any code point lacking said property, which one writes \PN for.

Some regex dialects pusillanimously insist on embracing those, as \p{N} or \P{N} — which is bunk, but you’re a prisoner of your language designer’s whims and foibles, insecurities or ignorance.

In those regex dialects of a more readable bent, you may write those in a more liberal and more legible fashion, as \p{Number} and \P{Number}, respectively.

If you mean a decimal number, which is not the same as a number, you may write that as \p{Nd}, with its complement therefore \P{Nd}. The legible version of those is \p{Decimal_Number} and \P{Decimal_Number}. In some programming languages, this is what the \d regex convenience abbreviation stands for.

There are four general categories related to numbers:

       N           Number
       Nd          Decimal_Number (also Digit)
       Nl          Letter_Number
       No          Other_Number

and there are numerous other categories related to numbers:

    Alnum                                    InCommonIndicNumberForms                 Numeric_Type:Numeric                     Numeric_Value:18                         Numeric_Value:38                         Numeric_Value:400                        Numeric_Value:60000
    Bidi_Class:Arabic_Number                 InCountingRodNumerals                    Numeric_Value:0                          Numeric_Value:19                         Numeric_Value:39                         Numeric_Value:500                        Numeric_Value:70000
    Bidi_Class:European_Number               InCuneiformNumbersAndPunctuation         Numeric_Value:NaN                        Numeric_Value:20                         Numeric_Value:40                         Numeric_Value:600                        Numeric_Value:80000
    Block:Aegean_Numbers                     InEnclosedAlphanumerics                  Numeric_Value:1                          Numeric_Value:21                         Numeric_Value:41                         Numeric_Value:700                        Numeric_Value:90000
    Block:Ancient_Greek_Numbers              InEnclosedAlphanumericSupplement         Numeric_Value:2                          Numeric_Value:22                         Numeric_Value:42                         Numeric_Value:800                        Numeric_Value:100000
    Block:Common_Indic_Number_Forms          InMathematicalAlphanumericSymbols        Numeric_Value:3                          Numeric_Value:23                         Numeric_Value:43                         Numeric_Value:900                        Numeric_Value:100000000
    Block:Counting_Rod_Numerals              InNumberForms                            Numeric_Value:4                          Numeric_Value:24                         Numeric_Value:44                         Numeric_Value:1000                       Numeric_Value:1000000000000
    Block:Cuneiform_Numbers_And_Punctuation  InRumiNumeralSymbols                     Numeric_Value:5                          Numeric_Value:25                         Numeric_Value:45                         Numeric_Value:2000                       Other_Number
    Block:Enclosed_Alphanumeric_Supplement   Letter_Number                            Numeric_Value:6                          Numeric_Value:26                         Numeric_Value:46                         Numeric_Value:3000                       PosixAlnum
    Block:Enclosed_Alphanumerics             Line_Break:Infix_Numeric                 Numeric_Value:7                          Numeric_Value:27                         Numeric_Value:47                         Numeric_Value:4000                       Sentence_Break:Numeric
    Block:Mathematical_Alphanumeric_Symbols  Line_Break:Numeric                       Numeric_Value:8                          Numeric_Value:28                         Numeric_Value:48                         Numeric_Value:5000                       Word_Break:ExtendNumLet
    Block:Number_Forms                       Line_Break:Postfix_Numeric               Numeric_Value:9                          Numeric_Value:29                         Numeric_Value:49                         Numeric_Value:6000                       Word_Break:MidNum
    Block:Rumi_Numeral_Symbols               Line_Break:Prefix_Numeric                Numeric_Value:10                         Numeric_Value:30                         Numeric_Value:50                         Numeric_Value:7000                       Word_Break:MidNumLet
    Decimal_Number                           Number                                   Numeric_Value:11                         Numeric_Value:31                         Numeric_Value:60                         Numeric_Value:8000                       Word_Break:Numeric
    General_Category:Decimal_Number          Numeric_Type:De                          Numeric_Value:12                         Numeric_Value:32                         Numeric_Value:70                         Numeric_Value:9000                       XPosixAlnum
    General_Category:Letter_Number           Numeric_Type:Decimal                     Numeric_Value:13                         Numeric_Value:33                         Numeric_Value:80                         Numeric_Value:10000                      
    General_Category:Number                  Numeric_Type:Di                          Numeric_Value:14                         Numeric_Value:34                         Numeric_Value:90                         Numeric_Value:20000                      
    General_Category:Other_Number            Numeric_Type:Digit                       Numeric_Value:15                         Numeric_Value:35                         Numeric_Value:100                        Numeric_Value:30000                      
    InAegeanNumbers                          Numeric_Type:None                        Numeric_Value:16                         Numeric_Value:36                         Numeric_Value:200                        Numeric_Value:40000                      
    InAncientGreekNumbers                    Numeric_Type:Nu                          Numeric_Value:17                         Numeric_Value:37                         Numeric_Value:300                        Numeric_Value:50000       

So. . . just which particular sort of “numbers” did you happen to be interested in? :)

like image 42
tchrist Avatar answered Jan 12 '23 13:01

tchrist