I am running a R program that computes similarity between product descriptions. The input to the program is a file with 1 column, containing the list of Product Descriptions, each on a separate row
I have another file that contains the list of product titles, each on a separate row.
Using dist function, I have computed the similarity between product descriptions and they are stored in dist.mat as a matrix.
Next, I want to join product title to this similarity that I have computed. So, I read the Product Titles in Names and then:
dist.mat <- data.frame(dist.mat, row.names=Names[,1])
colnames(dist.mat) <- (row.names(dist.mat))
and then I get an error: Error in data.frame(dist.mat, row.names = Names[, 1]) : row names supplied are of the wrong length
Not really sure on how to fix it. I read this: Invalid 'row.names' length but I can't fix the error using Sample$ or as.character
I am using: lsa_0.73, SnowballC_0.5.1, tm_0.5-10
Here is an actual example: Product Desc file:
Product Title File:
Output Example
Would be great if someone can help
As the error message says, the rownames is not of the same length as the number of columns because when we add a new column with row.names=Names[,1] obviously, there will be one more column. So, I guess this can be fixed
colnames(dist.mat)[-ncol(dist.mat)] <- row.names(dist.mat)
Instead of having the row.names column as the last one, it may be better to have it as first column
dist.mat1 <- data.frame(rn = row.names(Names[,1]), dist.mat)
colnames(dist.mat1)[-1] <- row.names(dist.mat)
Distance matrix (class dist) for a vector is displayed as one row and one column smaller triangular matrix than the vector length.
library(stringdist)
desc <- c("This glass can be used to drink whiskey",
"This is a stainless steel glass",
"This is a red rose")
Names <- c("Whiskeyglass", "glass", "rose")
dist.mat1 <- stringdistmatrix(desc)
dist.mat1
# 1 2
# 2 27
# 3 24 18
However, a dist object does not have dimensions and therefore row and column names cannot be assigned to it.
dim(dist.mat1)
# NULL
Trying to name the rows and columns of a distobject results in an error.
row.names(dist.mat1) <- colnames(dist.mat1) <- Names
Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class ""dist"" to a data.frame
To obtain the result you expect, a dist object first needs to be converted to a matrix. This adds the zeros along a diagonal and thus also a row and a column.
if(class(dist.mat1) == "dist"){
dist.mat2 <- as.matrix(dist.mat1)
row.names(dist.mat2) <- colnames(dist.mat2) <- Names
} else {
dist.mat2 <- dist.mat1
row.names(dist.mat2) <- colnames(dist.mat2) <- Names
}
dist.mat2
# Whiskeyglass glass rose
# Whiskeyglass 0 27 24
# glass 27 0 18
# rose 24 18 0
If your dist.mat looks like dist.mat1 above, but its class is matrix, then you need to select which Names belong where.
row.names(dist.mat) <- Names[-1] # removing the first name for rows
colnames(dist.mat) <- Names[-length(Names)] # removing the last name for columns
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With