I've the following dataframe:
freq.a freq.b
1 NULL 0.055
2 0.030 0.055
3 0.060 0.161
4 0.303 0.111
5 0.393 0.111
6 0.121 0.388
7 0.090 0.111
And I would like to replace the NULL
with an actual 0. However executing df.m[is.null(df.m)] <- 0
doesn't change anything in the dataframe.
MWE as follows(sorry for the length):
library(plyr)
df.a <- c(5, 4, 5, 7, 3, 5, 6, 5, 5, 4, 5, 5, 4, 5, 4, 7, 2, 4, 4, 5, 3, 6, 5, 6, 4, 4, 5, 4, 5, 5, 6, 7, 4)
df.b <- c(1, 3, 4, 6, 2, 7, 7, 4, 3, 6, 6, 3, 6, 6, 5, 6, 6, 5)
df.a.count <- count(df.a)
df.b.count <- count(df.b)
#normalize the data
df.a.count$freq <- lapply(df.a.count$freq, function(X) X/length(df.a))
df.b.count$freq <- lapply(df.b.count$freq, function(X) X/length(df.b))
df.m <- merge(df.a.count, df.b.count, by ='x', all=TRUE)[2:3]
names(df.m) <- c('freq.a', 'freq.b')
#replace the NULL's with 0
df.m[is.null(df.m)] <- 0
You shouldn't use lapply
. Use sapply
instead. This will produce NA
's instead of NULL
's. You can then do:
df.m[is.na(df.m)] <- 0
Explanation:
lapply
returns a list instead of a vector. In lists you can have NULL values. sapply
returns the same values in form of a vector, but with NA
s instead of NULL
s.
The reason is the use of lapply
which returns a list
and it can be easily found be looking at the structure of the dataset i.e. str(df.m)
.
We can also do this using base R
alone. Get all the unique
elements from both the vector
s ('lvls'), convert both the datasets to factor
by specifying the levels
as 'lvls', get the frequency count (table
) and the proportion (prop.table
), cbind
the output and round
if necessary.
lvls <- sort(union(unique(df.a), unique(df.b)))
round(cbind(prop.table(table(factor(df.a, levels = lvls))),
prop.table(table(factor(df.b, levels = lvls)))), 3)
# [,1] [,2]
#1 0.000 0.056
#2 0.030 0.056
#3 0.061 0.167
#4 0.303 0.111
#5 0.394 0.111
#6 0.121 0.389
#7 0.091 0.111
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With