(I come from the python world, so I apologise if some of the terminology I use jars with the norm.)
I have a String
with a List
of start/end indices to replace. Without getting too much into detail, consider this basic mockup:
String text = "my email is [email protected] and my number is (213)-XXX-XXXX"
List<Token> findings = SomeModule.someFnc(text);
And Token
has the definition of
class Token {
int start, end;
String type;
}
This List
represents start and end positions of sensitive data that I'm trying to redact.
Effectively, the API returns data that I iterate over to get:
[{ "start" : 12, "end" : 22, "type" : "EMAIL_ADDRESS" }, { "start" : 41, "end" : 54, "type" : "PHONE_NUMBER" }]
Using this data, my end goal is to redact the tokens in text
specified by these Token
objects to get this:
"my email is [EMAIL_ADDRESS] and my number is [PHONE_NUMBER]"
The thing that makes this question non-trivial is that the replacement substrings aren't always the same length as the substrings they're replacing.
My current plan of action is to build a StringBuilder
from text
, sort these IDs in reverse order of start indices, and then replace from the right end of the buffer.
But something tells me there should be a better way... is there?
This approach works:
import java.util.ArrayList;
import java.util.List;
public class Test {
public static void main(String[] args) {
String text = "my email is [email protected] and my number is (213)-XXX-XXXX";
List<Token> findings = new ArrayList<>();
findings.add(new Token(12, 22, "EMAIL_ADDRESS"));
findings.add(new Token(41, 54, "PHONE_NUMBER"));
System.out.println(replace(text, findings));
}
public static String replace(String text, List<Token> findings) {
int position = 0;
StringBuilder result = new StringBuilder();
for (Token finding : findings) {
result.append(text.substring(position, finding.start));
result.append('[').append(finding.type).append(']');
position = finding.end + 1;
}
return result.append(text.substring(position)).toString();
}
}
class Token {
int start, end;
String type;
Token(int start, int end, String type) {
this.start = start;
this.end = end;
this.type = type;
}
}
Output:
my email is [EMAIL_ADDRESS] and my number is [PHONE_NUMBER]
Ensure that all tokens are sorted by start
index in ascending order:
List<Token> tokens = new ArrayList<>();
tokens.sort(Comparator.comparing(Token::getStart));
Now you can replace all strings starting from the end of the input text:
public String replace(String text, List<Token> tokens) {
StringBuilder sb = new StringBuilder(text);
for (int i = tokens.size() - 1; i >= 0; i--) {
Token token = tokens.get(i);
sb.replace(token.start, token.end + 1, "[" + token.type + "]");
}
return sb.toString();
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With