Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting string to parts with specific pattern and conditions

I have the below-like array of about 5k+ strings as output from certain application (for security reasons I may not provide the exact data, but the example format is pretty much similar to the actual data):

kasdfhkasdhfaskdfj42345sdsadkfdkfhasdf5345534askfhsad
asdfasdf66sdafsdfsdf4560sdfasdfasdf
sdfaasdfs96sadfasdf65459asdfasdf
sadfasdf8asdfasdas06666654asdfasdfsd
fasdjfsdjfhgasdf6456sadfasdfasdf9sdfasdfsadf

Simply, I have non-breaking alphanumeric string that consists of 5 parts:

[latin letters][1 or more digits][latin letters][1 or more digits][latin letters]

Length of letter parts, as well as amount of digits is random, overall string length may vary from several to 2-3 hundreds of chars, but the pattern is still as above.

Practically I'm interested in leading and trailing string parts, i.e. [1 or more digits][latin letters][1 or more digits] may be just thrown away, but 2 other strings should be extracted to separate cells.

I tried SUBSTITUTE and SEARCH functions, but I still may not handle random amount of digits. VBA is the last desired approach, however it is acceptable in case pure formulas are useless. Moreover, the solution should be flexible for possible future use with similar patterns - so any right guidance / general approach will be appreciated.

like image 350
Ksenia Avatar asked Feb 17 '23 12:02

Ksenia


1 Answers

If you don't mind using MS Word instead of Excel - there's a very straightforward approach for such tasks which involves built-in Search and Replace routine using wildcards. Assuming data may be opened in Word, do the following:

  1. Press CTRL+H for Replace dialog opening.
  2. Tick Use wildcards option.
  3. The part of your data you want to throw away match to the following pattern: [0-9]{1,}*[0-9]{1,} - which means any digit 1 or more times with any chars between. Depending on your regional settings you'll need ; instead of , here.
  4. Specify as a replacement any char you like, e.g. ^t (Tab) or ; - for further parts splitting.
  5. Perform replacement.
  6. Optionally you may convert the rest to table using Ribbon Insert > Table > Convert Text to Table... feature.

All you need now is to save / paste the result obtained.

Actually, the approach is quite powerful, and many routine text data parsing tasks similar to your may be quickly done without special skills and/or programming. And you don't need any 3rd party tool for this - every PC has Word installed nowadays.

Read more about patterns and applicable cases:

  • Find and Replace using wildcards
  • Finding and replacing characters using wildcards
like image 121
Maks Gordeev Avatar answered May 03 '23 21:05

Maks Gordeev