Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count number of workers at end of each month with entry-date and retired-date [closed]

Tags:

dataframe

r

I have dataframe with company-entry-date and retire-date. From this dataframe I wanna count number of workers at the end of each month.

-Dataframe has company-entry-date and retire-date.

ID entry-date retire-date
===========================
1  2008-05-06 2010-03-03
2  2008-06-04 2009-02-25

I wanna count like this
Time        NumberOfWorkers
===============================
2008-05-31  1
2008-06-30  2
2008-07-31  2
......
2009-02-28  1
.......
2010-03-31  0

Can you help me?

like image 909
raccoonjin Avatar asked Jan 02 '26 04:01

raccoonjin


1 Answers

lubridate package has functions to easily extract parts of date objects; For instance

library(lubridate)
month("2011/11/07")  # 11
year("2011/10/07")   # 2011

Edit: JoeArtisan pointed out that the zoo package can be used to create a year-month variable (see his comment), which would be an interesting alternative. The code could easily be adapted to work with such a viariable. (end Edit)

So you could calculate new columns, one that would contain years, and one other containing months. You end up with a new dataframe (keeping only those new columns with made-up dates)

entry.year entry.month retire.year retire.month
2008       12           2010        06
2007       05           2009        05
2009       04           2011        11
2010       03           2012        01

Let's use even more fake data and see where it leads us...

dat <- read.csv(text=
"entry.year,entry.month,retire.year,retire.month
2008,12,2010,06
2007,05,2009,05
2009,04,2011,11
2010,03,2012,01
2008,12,2011,01
2007,05,2010,06
2009,03,2011,01
2010,03,2012,01
2008,11,2009,07
2007,05,2011,11
2008,04,2010,02
2010,03,2011,06
2008,09,2012,01
2007,05,2010,06
2009,03,2011,01
2010,03,2012,01")

Then you could use the following to get frequencies:

entries <- as.data.frame(table(dat[,1:2]))
retires <- as.data.frame(table(dat[,3:4]))
colnames(entries) = c("Year", "Month", "Freq.entries")
colnames(retires) = c("Year", "Month", "Freq.retires")

And merge those tables

final <- merge(entries, retires, by = c("Year", "Month"), all=TRUE)

# Replace NA values with 0
final[is.na(final)] <- 0

# Calculate the monthly fluctuation in employees
final$difference <- final$Freq.entries - final$Freq.retires

# And a cumulative sum
final$cumul <- cumsum(final$difference)

Results

> final
   Year Month Freq.entries Freq.retires difference cumul
1  2007     3            0            0          0     0
2  2007     4            0            0          0     0
3  2007     5            4            0          4     4
4  2007     9            0            0          0     4
5  2007    11            0            0          0     4
6  2007    12            0            0          0     4
7  2008     3            0            0          0     4
8  2008     4            1            0          1     5
9  2008     5            0            0          0     5
10 2008     9            1            0          1     6
11 2008    11            1            0          1     7
12 2008    12            2            0          2     9
13 2009     3            2            0          2    11
14 2009     4            1            0          1    12
15 2009     5            0            1         -1    11
16 2009     9            0            0          0    11
17 2009    11            0            0          0    11
18 2009    12            0            0          0    11
19 2009     1            0            0          0    11
20 2009     2            0            0          0    11
21 2009     6            0            0          0    11
22 2009     7            0            1         -1    10
23 2010     3            4            0          4    14
24 2010     4            0            0          0    14
25 2010     5            0            0          0    14
26 2010     9            0            0          0    14
27 2010    11            0            0          0    14
28 2010    12            0            0          0    14
29 2010     1            0            0          0    14
30 2010     2            0            1         -1    13
31 2010     6            0            3         -3    10
32 2010     7            0            0          0    10
33 2011     5            0            0          0    10
34 2011    11            0            2         -2     8
35 2011     1            0            3         -3     5
36 2011     2            0            0          0     5
37 2011     6            0            1         -1     4
38 2011     7            0            0          0     4
39 2012     5            0            0          0     4
40 2012    11            0            0          0     4
41 2012     1            0            4         -4     0
42 2012     2            0            0          0     0
43 2012     6            0            0          0     0
44 2012     7            0            0          0     0

Then you could also, if relevant, go further and calculate aggregate stats for chucks of time (4 or 6 months, or whole years) pretty easily. For instance,

by(data = final$Freq.entries, INDICES = final$Year, FUN = sum, na.rm=TRUE) 
by(data = final$Freq.retires, INDICES = final$Year, FUN = sum, na.rm=TRUE)

And so on and so forth... Lots of fun, no?

like image 196
Dominic Comtois Avatar answered Jan 03 '26 20:01

Dominic Comtois



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!