I'm doing analysis on a file and I want to mask some characters (while retaining their original byte count) before moving this file down the pipeline. For example, given file.txt:
Hello there Cory Klein
Have fun
Hello there Samantha Rodgers
Writing a regular expression to match the names and substituting them with XXXXX is pretty easy with sed:
$ sed -e "s/\(Hello there \).*/\1XXXXX/" file.txt
Hello there XXXXX
Have fun
Hello there XXXXX
But I would like to replace each character in the name with a X character, like so:
Hello there XXXX XXXXX
Have fun
Hello there XXXXXXXX XXXXXXX
How do I replace all characters matching a regular expression with another character?
Any regular POSIX tool is OK sed, awk, perl, etc. I'm sure I could write a simple python script to accomplish this, but I'm curious whether this is possible with regex alone, which would likely be more succinct. If so, I'd love to learn how so I could likely apply the concept in other places in the future.
With sed you need to filter out lines that don't contain Hello there using an address:
/Hello there/{...}
Then replace whatever single non-whitespace character that comes after Hello there with one x:
s/(^.*Hello there *)?[^[:space:]]/\1x/g
We are leaving Hello there and its preceding characters intact using \1.
The whole command would be:
$ sed -r '/Hello there/{s/(^.*Hello there *)?[^[:space:]]/\1x/g}' file
Hello there xxxx xxxxx
Have fun
Hello there xxxxxxxx xxxxxxx
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With