Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to import a CSV file with delimiter as ";" and decimal separator as "," into SAS?

I`ve got (and will receive in the future) many CSV files that use the semicolon as delimiter and the comma as decimal separator. So far I could not find out how to import these files into SAS using proc import -- or in any other automated fashion without the need for messing around with the variable names manually.

Create some sample data:

%let filename = %sysfunc(pathname(work))\sap.csv;

data _null_;
  file "&filename";
  put 'a;b';
  put '12345,11;67890,66';
run;

The import code:

proc import out = sap01 
datafile= "&filename"
dbms = dlm; 
delimiter = ";";
GETNAMES = YES; 
run;

After the import a value for the variable "AMOUNT" such as 350,58 (which corresponds to 350.58 in the US format) would look like 35,058 (meaning thirtyfivethousand...) in SAS (and after re-export to the German EXCEL it would look like 35.058,00). A simple but dirty workaround would be the following:

data sap02; set sap01;
AMOUNT = AMOUNT/100;
format AMOUNT best15.2;
run;

I wonder if there is a simple way to define the decimal separator for the CVS-import (similar to the specification of the delimiter). ..or any other "cleaner" solution compared to my workaround. Many thanks in advance!

like image 504
Joz Avatar asked Feb 05 '15 15:02

Joz


1 Answers

You technically should use dbms=dlm not dbms=csv, though it does figure things out. CSV means "Comma separated values", while DLM means "delimited", which is correct here.

I don't think there's a direct way to make SAS read in with the comma via PROC IMPORT. You need to tell SAS to use the NUMXw.d informat when reading in the data, and I don't see a way to force that setting in SAS. (There's an option for output with a comma, NLDECSEPARATOR, but I don't think that works here.)

Your best bet is either to write data step code yourself, or to run the PROC IMPORT, go to the log, and copy/paste the read in code into your program; then for each of the read-in records add :NUMX10. or whatever the appropriate maximum width of the field is. It will end up looking something like this:

data want;
  infile "whatever.txt" dlm=';' lrecl=32767 missover;
  input
    firstnumvar :NUMX10.
    secondnumvar :NUMX10.
    thirdnumvar :NUMX10.
    fourthnumvar :NUMX10.
    charvar :$15.
    charvar2 :$15.
  ;
run;

It will also generate lots of informat and format code; you can alternately convert the informats to NUMX10. instead of BEST. instead of adding the informat to the read-in. You can also just remove the informats, unless you have date fields.

data want;
  infile "whatever.txt" dlm=';' lrecl=32767 missover;
  informat firstnumvar secondnumvar thirdnumvar fourthnumvar NUMX10.;
  informat charvar $15.;
  format  firstnumvar secondnumvar thirdnumvar fourthnumvar BEST12.;
  format charvar $15.;
  input
    firstnumvar
    secondnumvar
    thirdnumvar
    fourthnumvar
    charvar $
  ;
run;
like image 78
Joe Avatar answered Sep 23 '22 00:09

Joe