I read in a text and want to remove all the punctuation of it. My first idea was:
data = readlines("text.txt")
data = lowercase.(data)
data = replace.(data, [','], [""])
data = replace.(data, ['.'], [""])
data = replace.(data, ['?'], [""])
data = replace.(data, [';'], [""])
data = replace.(data, ['!'], [""])
data = replace.(data, [':'], [""])
data = replace.(data, ['('], [""])
data = replace.(data, [')'], [""])
This gets quite fast annoying. I did not find a way to combine them all in one statement. With replace.(data, [".", ";"], ["", ""])
I get a DimensionMismatch.
Any ideas?
When broadcasting if you do not want a collection (like an array or a tuple) to be iterated over you should wrap it in an array (in the example I use only two characters ,
and ;
as substitution, but this can be more):
julia> data = ["a,b;c","x,y;z"]
2-element Array{String,1}:
"a,b;c"
"x,y;z"
julia> replace.(data, [[',',';']], "")
2-element Array{String,1}:
"abc"
"xyz"
The key part is [[',',';']]
which wraps an array of substitution alternatives into a one element array.
Another approach would be to use a regular expression:
julia> replace.(data, r"[,;]", "")
2-element Array{String,1}:
"abc"
"xyz"
Now the substitution pattern r"[,;]"
does not need to be wrapped.
If you care about the performance the first pattern with [[',',';']]
is a bit faster, but using regular expression is more flexible as it allows you to capture more complex patterns.
Now it would be:
julia> replace.(data, [',',';'] => "")
2-element Array{String,1}:
"abc"
"xyz"
or
julia> replace.(data, r"[,;]" => "")
2-element Array{String,1}:
"abc"
"xyz"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With