Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summarise data into monthly counts by year

I'm not used to working with time series data in R, and I'm a bit stuck with this. I have a data frame of event references and the data the event was recorded. The data runs over a period of 7 years and want to summarise it into the number of event per month over the 7 year period and plot that with ggplot2.

I can't seem to get the date conversions to work together so I end up with a count and a date I can feed to ggplot2's scale_x_date() function

Here's an example of the data:

df <- structure(list(Ref = structure(c(127L, 33L, 232L, 392L, 490L, 
242L, 437L, 346L, 443L, 560L, 598L, 568L, 103L, 262L, 463L, 17L, 
114L, 276L, 361L, 422L), .Label = c("01090013", "0109005", "0109006", 
"0109007", "0109009", "0109010", "0109011", "0109012", "0109014", 
"0109016", "0109022", "0110001", "0110004", "0110007", "0110009", 
"0110011", "0111001", "0111002", "0111012", "0111016", "0111017", 
"0112001", "0112003", "0112008", "0112010", "015004", "015006", 
"015008", "015010", "015013", "016002", "016003", "016004", "016005", 
"016006", "016008", "016009", "016010", "016011", "016013", "016014", 
"016016", "017001", "018001", "018004", "018005", "018007", "018008", 
"018009", "020626", "0209024", "0209025", "0209026", "0209027", 
"0209029", "0209031", "0209035", "0209037", "02100020", "0210017", 
"0210018", "0210023", "0210026", "0210030", "0211018", "0211019", 
"0211020", "0211022", "0211024", "0211025", "0211026", "0212018", 
"0212021", "0212025", "0212027", "025018", "025021", "025022", 
"025023", "025024", "025025", "025026", "025030", "026019", "026020", 
"026021", "026023", "026025", "026027", "026030", "026032", "0270010", 
"027010", "027012", "027013", "027014", "027016", "027017", "0309038", 
"0309039", "0309041", "0309046", "0309050", "0309052", "0309053", 
"0310035", "0310037", "0310041", "0310043", "0310044", "0311028", 
"0311032", "0311035", "0311038", "0312031", "0312036", "0312037", 
"0312043", "0312045", "0312047", "0312056", "0312058", "0312059", 
"0312062", "035033", "035034", "035036", "035037", "035038", 
"035040", "035041", "035042", "035043", "035045", "035049", "036036", 
"036038", "036039", "036041", "036042", "036044", "036045", "036046", 
"036047", "036048", "036050", "036051", "037021", "037026", "037029", 
"038026", "038032", "038034", "038035", "038036", "0409056", 
"0409057", "0409062", "0410046", "0410049", "0410050", "0410051", 
"0410054", "0410055", "0410056", "0410057", "0410058", "0410060", 
"0410062", "0410064", "0411047", "0411051", "0411052", "0411055", 
"0412070", "0412074", "0412075", "0412076", "045054", "045056", 
"045058", "045063", "045064", "045065", "045072", "046054", "046055", 
"046058", "046060", "047035", "047036", "047037", "047038", "047041", 
"047042", "047044", "047045", "047046", "048040", "048043", "048044", 
"048045", "048048", "048050", "048051", "0509073", "0509080", 
"0510066", "0510067", "0510082", "0511062", "0511065", "0511068", 
"0511069", "0511072", "0512084", "0512088", "0512089", "0512091", 
"055073", "055075", "055080", "055086", "055089", "055091", "055093", 
"055094", "055095", "056064", "056066", "056067", "056068", "056070", 
"056071", "056073", "056074", "057049", "057052", "057053", "057054", 
"057058", "057059", "057060", "057061", "057063", "057065", "057066", 
"057067", "057068", "057069", "058053", "058055", "058056", "058059", 
"058062", "058064", "0609082", "0609086", "0609088", "0609089", 
"0609090", "0609093", "0609095", "0609096", "0609097", "0609098", 
"0609103", "0610086", "0610089", "0610095", "0610096", "0610098", 
"0611073", "0611074", "0611080", "0611081", "0612109", "0612115", 
"065096", "065099", "065103", "065105", "065106", "065109", "065114", 
"066075", "066076", "066077", "066078", "066081", "066083", "067080", 
"067081", "067084", "068065", "068070", "068074", "0709106", 
"0709108", "0709113", "0709115", "0709116", "0709117", "0709120", 
"0710104", "0710105", "0710107", "0710108", "0710110", "0710115", 
"0710116", "0710117", "0710123", "0711083", "0711084", "0711085", 
"0711086", "0711087", "0711088", "0711092", "0712122", "0712126", 
"0712127", "0712128", "0712129", "075118", "075119", "075123", 
"075124", "075125", "075126", "075127", "075130", "075132", "075133", 
"076084", "076087", "076088", "076090", "076092", "076093", "076094", 
"077103", "077105", "078079", "078080", "078081", "078082", "078085", 
"078086", "0809126", "0809134", "0809137", "0809141", "0809143", 
"0810125", "0810137", "0811099", "0811101", "0811106", "0811108", 
"0811112", "0811113", "0811114", "0812142", "0812145", "0812150", 
"0812152", "0814143", "085139", "085143", "085145", "085148", 
"085149", "085150", "085154", "085156", "085160", "085163", "086098", 
"086099", "086100", "086101", "086102", "086104", "086107", "086108", 
"086109", "086110", "086111", "086112", "086114", "086115", "087106", 
"087107", "087109", "087112", "088094", "088096", "088097", "088098", 
"0909145", "0909155", "0909158", "0910145", "0910146", "0910147", 
"0910149", "0910150", "0910153", "0910154", "0911116", "0911117", 
"0911120", "0911121", "0911122", "0911123", "0911124", "0911130", 
"0911131", "0912161", "0912163", "0912168", "0912171", "0912172", 
"095166", "095167", "095170", "095171", "095172", "095178", "095180", 
"096116", "096117", "096121", "097120", "097124", "097125", "097126", 
"097132", "097133", "097136", "098110", "098115", "098116", "098119", 
"100006825", "100006830", "1009160", "1009161", "1009162", "1009164", 
"1009165", "1009166", "1009169", "1009170", "1009172", "1009173", 
"1009174", "1010160", "1010162", "1010163", "1010164", "1010166", 
"1010168", "1011133-A", "1011134", "1011140", "1011142", "1012179", 
"1012184", "1012185", "1012194", "105185", "105186", "105187", 
"105188", "105189", "105191", "105192", "105196", "105197", "105198", 
"105199", "105201", "105202", "105207", "105208", "105211", "106127", 
"106130", "106131", "107138", "107140", "107143", "107147", "107148", 
"107149", "107153", "107155", "107156", "108122", "108123", "108127", 
"108129", "108130", "108131", "108132", "108134", "108135", "108136", 
"1109175", "1109176", "1109180", "1109182", "1110173", "1110176", 
"1110177", "1110178", "1110185", "1110186", "1111145", "1111150", 
"1111151", "1112196", "1112197", "1112201", "1112202", "1112206", 
"1112208", "1112209", "1112212", "1112218", "1112220", "1112223", 
"1112225", "1112226", "1112227", "115215", "115216", "115217", 
"115218", "115219", "115223", "115225", "115226", "116139", "116143", 
"116144", "116145", "117161", "117162", "117164", "117165", "117168", 
"117175", "117180", "118139", "118140", "118143", "118147", "118148", 
"118150", "118152", "118154", "118157", "118160", "118161", "1209188", 
"1209189", "1209191", "1209193", "1209199", "1210191", "1210193", 
"1211157", "1211158", "1211168", "1211169", "1211170", "1211171", 
"1211173", "1212233", "1212235", "1212240", "125231", "125238", 
"125241", "126147", "126149", "127182", "127183", "127186", "127187", 
"127192", "127194", "128165", "128168", "128169", "128171", "128172", 
"128175", "128176", "128177", "128182", "128183", "128184", "128186", 
"128189", "128193"), class = "factor"), Date = structure(c(12846, 
13154, 13284, 13391, 13434, 13655, 13766, 14067, 14119, 14183, 
14209, 14211, 14322, 14412, 14897, 14960, 15049, 15155, 15201, 
15597), class = "Date")), .Names = c("Ref", "Date"), row.names = c(NA, 
-20L), class = "data.frame")

This is driving me crazy!

Thanks H

like image 719
Hassantm Avatar asked Mar 03 '13 00:03

Hassantm


People also ask

Which power query tool can you use to roll up daily transaction data into monthly transactions?

So within the Transform tab of the Query Editor, we'll see this tool called Group By. Now Group By allows you to aggregate or roll up your data at a different level than its current form. So a really common example of this is transforming something like daily data into weekly or monthly.

What is time series plot?

A time series chart, also called a times series graph or time series plot, is a data visualization tool that illustrates data points at successive intervals of time. Each point on the chart corresponds to both a time and a quantity that is being measured.


2 Answers

I believe you are looking for this:

df <- transform(df, month = format(Date,"%m"), year = format(Date, "%Y"))

counts <- ddply(df,.(month,year),nrow)

Then to plot the date:

# make a new monthly date
counts <- transform(counts, new_date = as.Date(paste(year,month,'01',sep="-")))

# now plot
ggplot(counts,aes(x=new_date,y=V1)) + geom_point() + scale_x_date()
like image 186
Gary Weissman Avatar answered Sep 27 '22 23:09

Gary Weissman


xts package is very handy for time series manipulations.

First I create the xts object :

 library(xts)
 dat.xts <- xts(df$Ref,order.by=as.POSIXct(df$Date))

Then I use apply.monthly to get the count by day, and plot it as xts object

count.month <- apply.monthly(dat.xts,FUN=length)
plot(count.month, type='b')

enter image description here

If you want to use ggplot2, you can transform the result to a data.frame.

as.data.frame(count.month)
like image 28
agstudy Avatar answered Sep 27 '22 22:09

agstudy