Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace special characters (dash)

Tags:

regex

r

I was attempting to replace what I thought was a standard dash using gsub. The code I was testing was:

gsub("-", "ABC", "reported – estimate")

This does nothing, though. I copied and pasted the dash into http://unicodelookup.com/#–/1 and it seems to be a en dash. That site provides the hex, dec etc codes for an en dash and I've been trying to replace the en dash but am not having luck. Suggestions?

(As a bonus, if you can tell me if there is a function to identify special characters that would be helpful).

I'm not sure if SO's code formatting will change the dash format so here is the dash I'm using (–).

like image 809
ZRoss Avatar asked Mar 01 '16 16:03

ZRoss


1 Answers

You can replace the en-dash by just specifying it in the regex pattern.

gsub("–", "ABC", "reported – estimate")

You can match all hyphens, en- and em-dashes with

gsub("[-–—]", "ABC", "reported – estimate — more - text")

See IDEONE demo

To check if there are non-ascii characters in a string, use

> s = "plus ça change, plus c'est la même chose"
> gsub("[[:ascii:]]+", "", s, perl=T)
[1] "çê"

See this IDEONE demo

You will either get an empty result (if a string only consists of "word" characters and whitespace), or - as here - some "special" characters.

like image 61
Wiktor Stribiżew Avatar answered Sep 28 '22 20:09

Wiktor Stribiżew