create hash value for each row of data in dataframe in R

Question

I am exploring how to compare two dataframe in R more efficiently, and I come up with hash.

My plan is to create hash for each row of data in two dataframe with same columns, using digest in digest package, and I suppose hash should be the same for any 2 identical row of data.

I tried to give and unique hash for each row of data, using the code below:

for (loop.ssi in (1:nrow(ssi.10q3.v1)))
    {ssi.10q3.v1[loop.ssi,"hash"] <- digest(as.character(ssi.10q3.v1[loop.ssi,]))
     print(paste(loop.ssi,nrow(ssi.10q3.v1),sep="/"))
     flush.console()
    }

But this is very slow.

Is my approach in comparing dataframe correct? If yes, any suggestion for speeding up the code above? Thanks.

UPDATE

I have updated the code as below:

ssi.10q3.v1[,"uid"] <- 1:nrow(ssi.10q3.v1)   

ssi.10q3.v1.hash <- ddply(ssi.10q3.v1,
                          c("uid"),
                          function(df)
                             {df[,"uid"]<- NULL
                              hash <- digest(as.character(df))
                              data.frame(hash=hash)
                             },
                          .progress="text")

I self-generated a uid column for the "unique" purpose.

mdsumner · Accepted Answer

I know this answer doesn't match the title of the question, but if you just want to see when rows are different you can do it directly:

rowSums(df2 == df1) == ncol(df1)

Assuming both data.frames have the same dimensions, that will evaluate to FALSE for every row that is not identical. If you need to test rownames as well that could be manage seperately and combined with the test of contents, and similarly for colnames (and attributes, and strict tests on column types).

 rowSums(df2 == df1) == ncol(df1) & rownames(df2) == rownames(df1)

create hash value for each row of data in dataframe in R

Tags:

database

r

hash

lokheart

1 Answers

mdsumner

Recent Activity

Donate For Us

create hash value for each row of data in dataframe in R

Tags:

database

r

hash

lokheart

1 Answers

mdsumner

Related questions

Recent Activity

Donate For Us