Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace NULL in a dataframe

Tags:

r

I've the following dataframe:

  freq.a freq.b              
1 NULL   0.055               
2 0.030  0.055              
3 0.060  0.161                    
4 0.303  0.111                   
5 0.393  0.111                   
6 0.121  0.388                   
7 0.090  0.111

And I would like to replace the NULL with an actual 0. However executing df.m[is.null(df.m)] <- 0 doesn't change anything in the dataframe.

MWE as follows(sorry for the length):

library(plyr)
df.a <- c(5, 4, 5, 7, 3, 5, 6, 5, 5, 4, 5, 5, 4, 5, 4, 7, 2, 4, 4, 5, 3, 6, 5, 6, 4, 4, 5, 4, 5, 5, 6, 7, 4)
df.b <- c(1, 3, 4, 6, 2, 7, 7, 4, 3, 6, 6, 3, 6, 6, 5, 6, 6, 5)
df.a.count <- count(df.a)
df.b.count <- count(df.b)

#normalize the data
df.a.count$freq <- lapply(df.a.count$freq, function(X) X/length(df.a))
df.b.count$freq <- lapply(df.b.count$freq, function(X) X/length(df.b))
df.m <- merge(df.a.count, df.b.count, by ='x', all=TRUE)[2:3]
names(df.m) <- c('freq.a', 'freq.b')

#replace the NULL's with 0
df.m[is.null(df.m)] <- 0
like image 347
raumkundschafter Avatar asked Nov 25 '16 11:11

raumkundschafter


2 Answers

You shouldn't use lapply. Use sapply instead. This will produce NA's instead of NULL's. You can then do:

df.m[is.na(df.m)] <- 0

Explanation:

lapply returns a list instead of a vector. In lists you can have NULL values. sapply returns the same values in form of a vector, but with NAs instead of NULLs.

like image 64
Carles Mitjans Avatar answered Oct 07 '22 00:10

Carles Mitjans


The reason is the use of lapply which returns a list and it can be easily found be looking at the structure of the dataset i.e. str(df.m).

We can also do this using base R alone. Get all the unique elements from both the vectors ('lvls'), convert both the datasets to factor by specifying the levels as 'lvls', get the frequency count (table) and the proportion (prop.table), cbind the output and round if necessary.

lvls <- sort(union(unique(df.a), unique(df.b)))
round(cbind(prop.table(table(factor(df.a, levels = lvls))), 
                  prop.table(table(factor(df.b, levels = lvls)))), 3)
#  [,1]  [,2]
#1 0.000 0.056
#2 0.030 0.056
#3 0.061 0.167
#4 0.303 0.111
#5 0.394 0.111
#6 0.121 0.389
#7 0.091 0.111
like image 31
akrun Avatar answered Oct 07 '22 01:10

akrun