Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby: How to split string while keeping delimiter and delimiter has length > 1?

Tags:

regex

ruby

Previous related questions only have delimiter with length == 1.

What I want is the following (for example)

str = 'Hello: Alice Hello: Bob Hello: Charlie Hello: David'
arr = str.magic_split('Hello:')

=> arr[0] = 'Hello: Alice '
   arr[1] = 'Hello: Bob '
   arr[2] = 'Hello: Charlie '
   arr[3] = 'Hello: David'

I tried str.scan(/Hello:/), but don't know how to crack regex to make it work. Thanks a lot.

I see that some of the answers only work for this particular case. Let me be more specific.

The file I want to split is like the following and the delimiter is "Certificate:"

Certificate:
    Data: ...
    Signature Algorithm: ...
...
-----BEGIN CERTIFICATE-----
F19ibG6uZyBJbmR1c3RyaWVzIEluYzESMBAGA1UECwwJTWV6emFuaW5lMRMwEQYD\n
2O2RV6HR84N2/A5ZPRF8AQMXJCLIR4qMe/d97/1XK6JQQLUI5NaNroUkW3+tjXo/\n
ovl3vom6xOwUUcFDdv2QoCYBVADX7W2RaVP0JGfiDcekOTwtdos/tmsblboR8oEp\n
fbxD45AowT+khXnPDCQWWpslXJoKMBkaWH7ajb+yKfEYGzRPEmq+v/FPMY9mlJhX\n
epciB5FNO5krO+cyhky5tBZTIv7qCu3kc36dcQXIOTakc7CdoVgwLnytebwTqtpG\n
KuLLH46U8Pp3eeiDDBxYJlz6a2bsbtOaKb1CKMFB3x8LLfLbF4Sh+ScDHetkJDh5\n
...
Certificate:
...
Certificate:
...

Basically, between "Certificate:" there will be random ASCII characters.

Thanks again.

like image 835
user180574 Avatar asked Jan 22 '26 03:01

user180574


1 Answers

This is a common case for using slice_before:

text = "Certificate:
    Data: ...
    Signature Algorithm: ...
...
-----BEGIN CERTIFICATE-----
F19ibG6uZyBJbmR1c3RyaWVzIEluYzESMBAGA1UECwwJTWV6emFuaW5lMRMwEQYD
2O2RV6HR84N2/A5ZPRF8AQMXJCLIR4qMe/d97/1XK6JQQLUI5NaNroUkW3+tjXo/
ovl3vom6xOwUUcFDdv2QoCYBVADX7W2RaVP0JGfiDcekOTwtdos/tmsblboR8oEp
fbxD45AowT+khXnPDCQWWpslXJoKMBkaWH7ajb+yKfEYGzRPEmq+v/FPMY9mlJhX
epciB5FNO5krO+cyhky5tBZTIv7qCu3kc36dcQXIOTakc7CdoVgwLnytebwTqtpG
KuLLH46U8Pp3eeiDDBxYJlz6a2bsbtOaKb1CKMFB3x8LLfLbF4Sh+ScDHetkJDh5
...
Certificate:
...
Certificate:
...
"

certificates = text.lines.slice_before(/^Certificate/).to_a
# => [["Certificate:\n",
#      "    Data: ...\n",
#      "    Signature Algorithm: ...\n",
#      "...\n",
#      "-----BEGIN CERTIFICATE-----\n",
#      "F19ibG6uZyBJbmR1c3RyaWVzIEluYzESMBAGA1UECwwJTWV6emFuaW5lMRMwEQYD\n",
#      "2O2RV6HR84N2/A5ZPRF8AQMXJCLIR4qMe/d97/1XK6JQQLUI5NaNroUkW3+tjXo/\n",
#      "ovl3vom6xOwUUcFDdv2QoCYBVADX7W2RaVP0JGfiDcekOTwtdos/tmsblboR8oEp\n",
#      "fbxD45AowT+khXnPDCQWWpslXJoKMBkaWH7ajb+yKfEYGzRPEmq+v/FPMY9mlJhX\n",
#      "epciB5FNO5krO+cyhky5tBZTIv7qCu3kc36dcQXIOTakc7CdoVgwLnytebwTqtpG\n",
#      "KuLLH46U8Pp3eeiDDBxYJlz6a2bsbtOaKb1CKMFB3x8LLfLbF4Sh+ScDHetkJDh5\n",
#      "...\n"],
#     ["Certificate:\n", "...\n"],
#     ["Certificate:\n", "...\n"]]
#     ["Certificate:\n", "...\n"]]

slice_before walks through an Array looking for lines that match a pattern. When it finds them it creates a sub-array of the previous lines, then continues looking for the next match. In the output above you can see the separate sub-arrays for each certificate created.

It's an amazingly useful method.

If, after slicing, you want to grab an encoded certificate, extract just those lines, because they should be at set offsets:

certificates.first[5 .. 10]
# => ["F19ibG6uZyBJbmR1c3RyaWVzIEluYzESMBAGA1UECwwJTWV6emFuaW5lMRMwEQYD\n",
#     "2O2RV6HR84N2/A5ZPRF8AQMXJCLIR4qMe/d97/1XK6JQQLUI5NaNroUkW3+tjXo/\n",
#     "ovl3vom6xOwUUcFDdv2QoCYBVADX7W2RaVP0JGfiDcekOTwtdos/tmsblboR8oEp\n",
#     "fbxD45AowT+khXnPDCQWWpslXJoKMBkaWH7ajb+yKfEYGzRPEmq+v/FPMY9mlJhX\n",
#     "epciB5FNO5krO+cyhky5tBZTIv7qCu3kc36dcQXIOTakc7CdoVgwLnytebwTqtpG\n",
#     "KuLLH46U8Pp3eeiDDBxYJlz6a2bsbtOaKb1CKMFB3x8LLfLbF4Sh+ScDHetkJDh5\n"]
like image 178
the Tin Man Avatar answered Jan 23 '26 21:01

the Tin Man