I have dataframe with company-entry-date and retire-date. From this dataframe I wanna count number of workers at the end of each month.
-Dataframe has company-entry-date and retire-date.
ID entry-date retire-date
===========================
1 2008-05-06 2010-03-03
2 2008-06-04 2009-02-25
I wanna count like this
Time NumberOfWorkers
===============================
2008-05-31 1
2008-06-30 2
2008-07-31 2
......
2009-02-28 1
.......
2010-03-31 0
Can you help me?
lubridate package has functions to easily extract parts of date objects; For instance
library(lubridate)
month("2011/11/07") # 11
year("2011/10/07") # 2011
Edit: JoeArtisan pointed out that the zoo package can be used to create a year-month variable (see his comment), which would be an interesting alternative. The code could easily be adapted to work with such a viariable. (end Edit)
So you could calculate new columns, one that would contain years, and one other containing months. You end up with a new dataframe (keeping only those new columns with made-up dates)
entry.year entry.month retire.year retire.month
2008 12 2010 06
2007 05 2009 05
2009 04 2011 11
2010 03 2012 01
Let's use even more fake data and see where it leads us...
dat <- read.csv(text=
"entry.year,entry.month,retire.year,retire.month
2008,12,2010,06
2007,05,2009,05
2009,04,2011,11
2010,03,2012,01
2008,12,2011,01
2007,05,2010,06
2009,03,2011,01
2010,03,2012,01
2008,11,2009,07
2007,05,2011,11
2008,04,2010,02
2010,03,2011,06
2008,09,2012,01
2007,05,2010,06
2009,03,2011,01
2010,03,2012,01")
Then you could use the following to get frequencies:
entries <- as.data.frame(table(dat[,1:2]))
retires <- as.data.frame(table(dat[,3:4]))
colnames(entries) = c("Year", "Month", "Freq.entries")
colnames(retires) = c("Year", "Month", "Freq.retires")
And merge those tables
final <- merge(entries, retires, by = c("Year", "Month"), all=TRUE)
# Replace NA values with 0
final[is.na(final)] <- 0
# Calculate the monthly fluctuation in employees
final$difference <- final$Freq.entries - final$Freq.retires
# And a cumulative sum
final$cumul <- cumsum(final$difference)
Results
> final
Year Month Freq.entries Freq.retires difference cumul
1 2007 3 0 0 0 0
2 2007 4 0 0 0 0
3 2007 5 4 0 4 4
4 2007 9 0 0 0 4
5 2007 11 0 0 0 4
6 2007 12 0 0 0 4
7 2008 3 0 0 0 4
8 2008 4 1 0 1 5
9 2008 5 0 0 0 5
10 2008 9 1 0 1 6
11 2008 11 1 0 1 7
12 2008 12 2 0 2 9
13 2009 3 2 0 2 11
14 2009 4 1 0 1 12
15 2009 5 0 1 -1 11
16 2009 9 0 0 0 11
17 2009 11 0 0 0 11
18 2009 12 0 0 0 11
19 2009 1 0 0 0 11
20 2009 2 0 0 0 11
21 2009 6 0 0 0 11
22 2009 7 0 1 -1 10
23 2010 3 4 0 4 14
24 2010 4 0 0 0 14
25 2010 5 0 0 0 14
26 2010 9 0 0 0 14
27 2010 11 0 0 0 14
28 2010 12 0 0 0 14
29 2010 1 0 0 0 14
30 2010 2 0 1 -1 13
31 2010 6 0 3 -3 10
32 2010 7 0 0 0 10
33 2011 5 0 0 0 10
34 2011 11 0 2 -2 8
35 2011 1 0 3 -3 5
36 2011 2 0 0 0 5
37 2011 6 0 1 -1 4
38 2011 7 0 0 0 4
39 2012 5 0 0 0 4
40 2012 11 0 0 0 4
41 2012 1 0 4 -4 0
42 2012 2 0 0 0 0
43 2012 6 0 0 0 0
44 2012 7 0 0 0 0
Then you could also, if relevant, go further and calculate aggregate stats for chucks of time (4 or 6 months, or whole years) pretty easily. For instance,
by(data = final$Freq.entries, INDICES = final$Year, FUN = sum, na.rm=TRUE)
by(data = final$Freq.retires, INDICES = final$Year, FUN = sum, na.rm=TRUE)
And so on and so forth... Lots of fun, no?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With