My data are structured as follows (these are just sample data as the original data are secret)
id | crime | location | crimedate
------------------------------
1 | Theft | public | 2019-01-04
1 | Theft | public | 2019-02-06
1 | Theft | public | 2019-02-20
1 | Theft | private | 2019-03-10
1 | Theft | private | 2019-03-21
1 | Theft | public | 2019-03-01
1 | Theft | private | 2019-03-14
1 | Theft | public | 2019-06-15
1 | Murder | private | 2019-01-04
1 | Murder | private | 2019-10-20
1 | Murder | private | 2019-11-18
1 | Murder | private | 2019-01-01
1 | Assault | private | 2019-03-19
1 | Assault | private | 2019-01-21
1 | Assault | public | 2019-04-11
1 | Assault | public | 2019-01-10
… | … | … | …
My goal is to create a lineplot (time series plot) showing how the numbers of the three crimes have changed over the year. Therefore on the x-axis I would like to show the monthes (1-12) and on the y-axis the number of crimes in each month. There should be two lines (one for each location).
I started with this code:
DATA new;
SET old;
month=month(datepart(crimedate));
RUN;
PROC sgplot DATA=new;
series x=month y=no_of_crimes / group=location;
run;
But I have no idea, how I can aggregate the number of crimes per month. Could anyone please give me a hint? I have been looking in the internet for a solution, but usually the examples just use data that are already aggregated.
The SG routines will aggregate Y axis values for a VBAR
or HBAR
statement. The same aggregate information displayed in a SERIES
statement would have to be from a apriori aggregate computation, easily done with Proc SUMMARY
.
Additionally, to plot the counts for each crime in a separate visual, you would want a BY CRIME
statement, or Proc SGPANEL
with PANELBY crime
.
The crime datetime value does not have to be down transformed to a date value, you can use the appropriate datetime
format in the procedures and they will auto-aggregate based on the formatted value.
Example with some simulated crime data:
data have;
do precinct = 1 to 10;
do date = '01jan2018'd to '31dec2018'd;
do seq = 1 to 20*ranuni(123);
length crime $10 location $8;
crime = scan('theft,assault,robbery,dnd', ceil(4*ranuni(123)));
location = scan ('public,private', ceil(2*ranuni(123)));
crime_dt = dhms(date,0,0,floor('24:00't*ranuni(123)));
output;
end;
end;
end;
drop date;
format crime_dt datetime19.;
run;
* shorter graphs for SO answer;
ods graphics / height=300px;
proc sgplot data=have;
title "VBAR all crimes combined by location";
vbar crime_dt
/ group=location
groupdisplay=cluster
;
format crime_dt dtmonyy7.;
run;
proc sgpanel data=have;
title "VBAR crime * location";
panelby crime;
vbar crime_dt
/ group=location
groupdisplay=cluster
;
format crime_dt dtmonyy7.;
run;
proc summary data=have noprint;
class crime_dt crime location;
format crime_dt dtmonyy7.;
output out=freqs;
run;
proc sgplot data=freqs;
title "SERIES all crimes,summary _FREQ_ * location";
where _type_ = 5;
series x=crime_dt y=_freq_ / group=location;
xaxis type=discrete;
run;
proc sgpanel data=freqs;
title "SERIES all crimes,summary _FREQ_ * crime * location";
where _type_ = 7;
panelby crime;
series x=crime_dt y=_freq_ / group=location;
rowaxis min=0;
colaxis type=discrete;
run;
If you want to group by location without definition by type of crime:
proc sql noprint;
create table new as
select id,location
, month(crimedate) as month,count(crime) as crime_n
from old
group by id,location,CALCULATED month;
quit;
proc sgplot data=new;
series x=month y=crime_n /group=location;
run;
The result:
To show different series by type of crime you could use sgpanel
:
proc sql noprint;
create table new as
select id,crime,location, month(crimedate) as month,count(crime) as crime_n
from old
group by id,crime,location,CALCULATED month;
quit;
proc sgpanel DATA=new;
panelby location;
series x=month y=crime_n /group=crime;
run;
The result is:
One more variant of perfoming this data:
proc sql noprint;
create table new as
select id,crime,location, month(crimedate) as month,count(crime) as crime_n
from old
group by id,crime,location,CALCULATED month;
quit;
proc sgpanel DATA=new;
panelby crime;
series x=month y=crime_n /group=location GROUPDISPLAY=cluster;
run;
The result is:
Of course, you can specify this plots how you want.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With