Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove from a string all except selected characters

Tags:

regex

r

gdata

I want to remove from a string all characters that are not digits, minus signs, or decimal points.

I imported data from Excel using read.xls, which include some strange characters. I need to convert these to numeric. I am not too familiar with regular expressions, so need a simpler way to do the following:

excel_coords <- c(" 19.53380ݰ", " 20.02591°", "-155.91059°", "-155.8154°")
unwanted <- unique(unlist(strsplit(gsub("[0-9]|\\.|-", "", excel_coords), "")))
clean_coords <- gsub(do.call("paste", args = c(as.list(unwanted), sep="|")), 
                     replacement = "", x = excel_coords)

> clean_coords
[1] "19.53380"   "20.02591"   "-155.91059" "-155.8154" 

Bonus if somebody can tell me why these characters have appeared in some of my data (the degree signs are part of the original Excel worksheet, but the others are not).

like image 811
J. Win. Avatar asked Nov 19 '25 00:11

J. Win.


2 Answers

Short and sweet. Thanks to comment by G. Grothendieck.

gsub("[^-.0-9]", "", excel_coords)

From http://stat.ethz.ch/R-manual/R-patched/library/base/html/regex.html: "A character class is a list of characters enclosed between [ and ] which matches any single character in that list; unless the first character of the list is the caret ^, when it matches any character not in the list."

like image 107
J. Win. Avatar answered Nov 20 '25 16:11

J. Win.


Can also be done by using strsplit, sapply and paste and by indexing the correct characters rather than the wrong ones:

 excel_coords <- c(" 19.53380ݰ", " 20.02591°", "-155.91059°", "-155.8154°")
 correct_chars <- c(0:9,"-",".")
 sapply(strsplit(excel_coords,""), 
          function(x)paste(x[x%in%correct_chars],collapse=""))

[1] "19.53380"   "20.02591"   "-155.91059" "-155.8154" 
like image 40
Sacha Epskamp Avatar answered Nov 20 '25 16:11

Sacha Epskamp



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!