I have a variable that is mostly numeric values, however there is occasionally a character snuck-in. Therefor, the variable is stored as a character in SAS. I would like to be able to determine if individual values in this character variable are numeric. After knowing which values are numeric and which are character, I would like to create a new (numeric) variable, where the numeric values of the character variable are stored as numbers and the character values are stored as missing.
Here is an example.
char_var --> num_var
a .
1 1
2.34 2.34
## .
cat .
I was hoping do be able to do it something like below, but am missing a function to help me:
if char_var=is.numeric(char_var) then num_var=char_var;
else num_var=.;
The is.numeric would be a function that would determine this for me.
Any advice is appreciated.
On a side note, in R I would try (maybe incorrectly) to progam this as follows:
mydata$type<-is.numeric(mydata$char_var)
if (mydata$type==1) {mydata$num_var=mydata$char_var} else {mydata$num_var=NA}
mydata$num_var<-as.numeric(mydata$num_var)
Using this dataset for all examples:
data have;
input xchar $;
datalines;
1
123
123.54
1234.43
123.1.4
124A
234.1A
1234E5
12.34E7
;;;;
run;
There are a lot of different ways of handling this, depending on your needs.
First off, you can let SAS handle it for you; Bob provides such a solution. I consider this pretty bad form, though; while in his program it is obvious you're doing it on purpose, in longer code it's not always obvious, and thus it can confuse other programmers and/or cause data errors. I'd not allow code that involved intentional implicit conversion to make it to production without good reason.
You can use the input statement to do the conversion, which is a better version of SAS handling things. In that case, you can easily suppress the warning messages about invalid conversions. The leading ?? suppresses conversion errors; a single ? would cause a line to be printed to the log for invalid conversions but less obtrusively than no ?.
data want_qmark;
set have;
xnum = input(xchar,??BEST12.);
run;
You could check the field explicitly; this depends a lot on your data. The following would work for simple numeric fields, but fails on the last two (with scientific notation), and on the line with two decimals.
if missing(compress(xnum,'.','d')) then xnum=input(xchar,best12.);
You could also write your own isnumeric function. The following uses a perl regular expression that is fairly robust (but not perfect, and does not include lots of valid numeric varieties such as comma or percents; you could add those in if your data suggested them).
options cmplib=work.funcs;
proc fcmp outlib=work.funcs.test;
function isnumeric(value $);
prx = prxparse('/^-?\d+\.?\d*(e\d+)?$/io');
rc = prxmatch(prx,trimn(value));
return(rc);
endsub;
quit;
data want_fcmp;
set have;
if isnumeric(xchar) then xnum=input(xchar,BEST12.);
run;
For most uses the simple input with ?? is probably sufficient.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With