Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NA values using sqldf

Tags:

sql

r

sqldf

If I try to get an average of c(NA, NA, 3, 4, 5, 6, 7, 8, 9, 10) using AVG from SQL, I get a value of 5.2, instead of the expected 6.5.

# prepare data and write to file
write.table(data.frame(col1 = c(NA, NA, 3:10)),
        "my.na.txt", row.names = FALSE)

mean(c(NA, NA, 3:10), na.rm = TRUE) # 6.5

my.na <- read.csv.sql("my.na.txt", sep = " ",
        sql = "SELECT AVG(col1) FROM file") # 5.2

# this is identical to
sum(3:10)/10

unlink("my.na.txt") # remove file

Which leads me to believe that sql(df) treats NA values as zero. Is it possible to ignore (exclude) NA values in an SQL call as it can be done using na.rm argument (in R)?

like image 871
Roman Luštrik Avatar asked Jan 17 '23 19:01

Roman Luštrik


2 Answers

Modify your query to ignore the NA values:

SELECT AVG(col1)
FROM file
WHERE col1 IS NOT \"NA\"
like image 192
mathematical.coffee Avatar answered Jan 25 '23 11:01

mathematical.coffee


The problem is that the read.csv.sql function does not recognize the missing values, and converts them to zero, instead of NULL. This does not happen if you first load the data into a data.frame, and only then call sqldf.

d <- read.csv("my.na.txt")
sqldf("SELECT AVG(col1) FROM d") # 6.5
like image 24
Vincent Zoonekynd Avatar answered Jan 25 '23 11:01

Vincent Zoonekynd