I have comma separated tokens, I need to convert them to tokens in colons separated by spaces. I wanted to use regular expression in notepad++ but had a problem.
My input is:
aaaaa, bbb ,cccc, hhhh, fff,t
I would like to get as result:
aaaaa bbb cccc hhhh fff t
Each token gets exactly 10 characters
My problem is how to make the output exactly 10 characters?
A normal “Find and Replace” can't do that, but it's possible with “Regular Expressions”. In Notepad++ press Ctr+H to open the “Find and Replace” window. Under Search Mode: choose “Regular expression” and then check the “matches newline” checkbox.
If you find any unnecessary commas in data then you can get them removed, owing to various functions, like TRIM, SUBSTITUTE, FIND, LEN, REPLACE or you can use FIND & REPLACE (CTRL + H). You can choose from several methods to remove them.
I see this as a two step process. Step One replace all the commas with 10 spaces. Step Two capture 10 characters and all trailing spaces, and replace with just the 10 captured characters.
,\s*|\s*$
Replace with: __________
these are unbars, but you should really use ten or more spaces.
Live Demo: https://regex101.com/r/mR1eS9/1
Sample Text
aaaaa, bbb ,cccc, hhhh, fff,t
After Replacement
aaaaa bbb cccc hhhh fff t
123456789,123456789,123456789,123456789,123456789,123456789,123456789,123456789
Note: I inserted the number line here to help illustrate the number and position of characters
(.{10})[^\S\n\r]*
Replace with: $1
Live Demo: https://regex101.com/r/uL8oO7/2
Sample Text
Because this is step two, the sample text is the output from step one above
aaaaa bbb cccc hhhh fff t
After Replacement
aaaaa bbb cccc hhhh fff t
123456789,123456789,123456789,123456789,123456789,123456789,123456789,123456789
Note: I inserted the number line here to help illustrate the number and position of characters
Regex computation model is so simple that it cannot count. However, in situations when you have only nine possible non-empty matches you can run nine separate global replacements to cover all possibilities (underscores _
are used in place of spaces for clarity):
Search Replacement
------------- -----------
(?<=\b\S{9}),\s _
(?<=\b\S{8}),\s __
(?<=\b\S{7}),\s ___
(?<=\b\S{6}),\s ____
...
(?<=\b\S{1}),\s _________
Each replacement operation matches a comma, space pair that follows x
non-space characters, and replaces them with 10-x
spaces.
Perhaps a solution with a programming language might be better to read and comprehend.
Find code samples for PHP
and Python
below (can easily be adopted to other languages as well):
<?php
$string = "aaaaa, bbb ,cccc, hhhh, fff,t";
$regex = '~(\w+)(\s*,|$)~';
# look for word characters, followed by spaces (or not)
# and a comma or the end of the string
$string = preg_replace_callback(
$regex,
function($match) {
return str_pad($match[1], 10);
},
$string);
echo $string;
# aaaaa bbb cccc hhhh fff t
?>
See a demo on ideone.com.
import re
string = "aaaaa, bbb ,cccc, hhhh, fff,t";
def repl(match):
return match.group(1).ljust(10)
rx = r'(\w+)(\s*,|$)'
string = re.sub(rx, repl, string)
print string
A demo on ideone.com as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With