Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Missing Rows from Dataset in R

Tags:

r

I have spent the better part of yesterday afternoon and this morning trying to gain some insight into my issue. If you can point me to any resources that would be great!

I have a dataframe in R (imported from an Oracle table), see the data below. I call it Loss_Data.

    Loss_Yr Dev_Lag Claim_Amnt
1   2007    1   300
2   2007    2   10
3   2007    3   250
4   2007    5   5
5   2008    1   450
6   2008    2   80
7   2008    4   3
8   2009    1   175
9   2009    3   20
10  2010    1   95
11  2010    2   40
12  2011    1   130

However, I need to get it to look like the following. I need to make sure that there is a row for every possible Loss_Yr and Dev_Lag combination. See the added rows

    Loss_Yr Dev_Lag Claim_Amnt
1   2007    1   300
2   2007    2   10
3   2007    3   250
***4    2007    4   0***
5   2007    5   5
6   2008    1   450
7   2008    2   80
***8    2008    3   0***
9   2008    4   3
10  2009    1   175
***11   2009    2   0***
12  2009    3   20
13  2010    1   95
14  2010    2   40
15  2011    1   130

At first I was thinking that I would create a "dummy" table with all possible combinations and then do a merge with my existing; keeping the records from my intial table, Loss_Data.

However, I am trying to build a process and this method wouldn't be very felxible.

Any ideas on how to tackle this?!

like image 510
Cara Wyrostek Avatar asked Jan 11 '13 21:01

Cara Wyrostek


People also ask

How do I find missing rows in R?

In R, missing values are represented by the symbol NA (not available). Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number). Unlike SAS, R uses the same symbol for character and numeric data.

How do I remove missing rows in R?

To remove all rows having NA, we can use na. omit function. For Example, if we have a data frame called df that contains some NA values then we can remove all rows that contains at least one NA by using the command na. omit(df).

How do I get rows of data in R?

To get a specific row of a matrix, specify the row number followed by a comma, in square brackets, after the matrix variable name. This expression returns the required row as a vector.


1 Answers

The approach you describe is the right idea. Maybe you're over-complicating the implementation?

d <- read.table(text="Loss_Yr Dev_Lag Claim_Amnt
1   2007    1   300
2   2007    2   10
3   2007    3   250
4   2007    5   5
5   2008    1   450
6   2008    2   80
7   2008    4   3
8   2009    1   175
9   2009    3   20
10  2010    1   95
11  2010    2   40
12  2011    1   130", header=TRUE, row.names=1)

filled <- merge(d, 
                with(d, expand.grid(Loss_Yr=unique(Loss_Yr), Dev_Lag=unique(Dev_Lag))), 
                all=TRUE)
like image 195
Matthew Plourde Avatar answered Sep 20 '22 01:09

Matthew Plourde