Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

getting odd value when trying to replace NA when using R

Tags:

r

na

Why am I getting "4" for agenew (row 23 of the 2nd display of the dataframe) after I execute the statement below? It seems like I should be getting a "0" instead.

agenew[is.na(agenew)] <- 4 * sibsp + 3 * parch

This is the dataframe before executing the statement.

    age sibsp agenew parch
1  34.5     0     69     0
2  47.0     1     98     0
3  62.0     0    124     0
4  27.0     0     54     0
5  22.0     1     48     1
6  14.0     0     28     0
7  30.0     0     60     0
8  26.0     1     56     1
9  18.0     0     36     0
10 21.0     2     50     0
11   NA     0     NA     0
12 46.0     0     92     0
13 23.0     1     50     0
14 63.0     1    130     0
15 47.0     1     98     0
16 24.0     1     52     0
17 35.0     0     70     0
18 21.0     0     42     0
19 27.0     1     58     0
20 45.0     0     90     0
21 55.0     1    114     0
22  9.0     0     18     1
23   NA     0     NA     0

This is the dataframe after executing the statement

> newdf
    age sibsp agenew parch
1  34.5     0     69     0
2  47.0     1     98     0
3  62.0     0    124     0
4  27.0     0     54     0
5  22.0     1     48     1
6  14.0     0     28     0
7  30.0     0     60     0
8  26.0     1     56     1
9  18.0     0     36     0
10 21.0     2     50     0
11   NA     0      0     0
12 46.0     0     92     0
13 23.0     1     50     0
14 63.0     1    130     0
15 47.0     1     98     0
16 24.0     1     52     0
17 35.0     0     70     0
18 21.0     0     42     0
19 27.0     1     58     0
20 45.0     0     90     0
21 55.0     1    114     0
22  9.0     0     18     1
23   NA     0      4     0
like image 352
Warren Chrusciel Avatar asked Jun 29 '26 16:06

Warren Chrusciel


1 Answers

Let n be the number of rows in your data.frame and m (where m < n) the number of rows where agenew is NA. Doing

agenew[is.na(agenew)] <- 4 * sibsp + 3 * parch

is wrong because the left-hand side has length m while the right-hand side has length n. That "4" you are getting as a replacement to agenew on row 23 (the second time agenew is NA) is the result of 4 * sibsp + 3 * parch on the second row of your data.frame, not the 23rd...

What you meant to do is:

agenew[is.na(agenew)] <- (4 * sibsp + 3 * parch)[is.na(agenew)]

but there are more elegant ways to do this, for example:

agenew <- ifelse(is.na(agenew), 4 * sibsp + 3 * parch, agenew)

where here, all vectors have length n.

Note: As you did in your question, I am skipping the part where all of these statements should be evaluated within your data.frame (see with, within, transform, etc.), e.g.:

df <- transform(df, agenew = ifelse(is.na(agenew), 4 * sibsp + 3 * parch, agenew))
like image 154
flodel Avatar answered Jul 02 '26 04:07

flodel