Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

row names supplied are of the wrong length in R

Tags:

r

I am running a R program that computes similarity between product descriptions. The input to the program is a file with 1 column, containing the list of Product Descriptions, each on a separate row

I have another file that contains the list of product titles, each on a separate row.

Using dist function, I have computed the similarity between product descriptions and they are stored in dist.mat as a matrix.

Next, I want to join product title to this similarity that I have computed. So, I read the Product Titles in Names and then:

dist.mat <- data.frame(dist.mat, row.names=Names[,1])  
colnames(dist.mat) <- (row.names(dist.mat))

and then I get an error: Error in data.frame(dist.mat, row.names = Names[, 1]) : row names supplied are of the wrong length

Not really sure on how to fix it. I read this: Invalid 'row.names' length but I can't fix the error using Sample$ or as.character

I am using: lsa_0.73, SnowballC_0.5.1, tm_0.5-10

Here is an actual example: Product Desc file:

  • This glass can be used to drink whiskey
  • This is a stainless steel glass
  • This is a red rose

Product Title File:

  • Whiskeyglass
  • glass
  • rose

Output Example

Would be great if someone can help

like image 997
user5712288 Avatar asked Nov 29 '25 14:11

user5712288


2 Answers

As the error message says, the rownames is not of the same length as the number of columns because when we add a new column with row.names=Names[,1] obviously, there will be one more column. So, I guess this can be fixed

 colnames(dist.mat)[-ncol(dist.mat)] <- row.names(dist.mat)

Instead of having the row.names column as the last one, it may be better to have it as first column

dist.mat1 <- data.frame(rn = row.names(Names[,1]), dist.mat) 
colnames(dist.mat1)[-1] <- row.names(dist.mat)
like image 143
akrun Avatar answered Dec 01 '25 05:12

akrun


Distance matrix (class dist) for a vector is displayed as one row and one column smaller triangular matrix than the vector length.

library(stringdist)

desc <- c("This glass can be used to drink whiskey",
   "This is a stainless steel glass",
   "This is a red rose")

Names <- c("Whiskeyglass", "glass", "rose")

dist.mat1 <- stringdistmatrix(desc)
dist.mat1
#    1  2
# 2 27   
# 3 24 18

However, a dist object does not have dimensions and therefore row and column names cannot be assigned to it.

dim(dist.mat1)
# NULL

Trying to name the rows and columns of a distobject results in an error.

row.names(dist.mat1) <- colnames(dist.mat1) <- Names

Error in as.data.frame.default(x[[i]], optional = TRUE) : cannot coerce class ""dist"" to a data.frame

To obtain the result you expect, a dist object first needs to be converted to a matrix. This adds the zeros along a diagonal and thus also a row and a column.

if(class(dist.mat1) == "dist"){
    dist.mat2 <- as.matrix(dist.mat1)
    row.names(dist.mat2) <- colnames(dist.mat2) <- Names
} else {
    dist.mat2 <- dist.mat1
    row.names(dist.mat2) <- colnames(dist.mat2) <- Names
}

dist.mat2
#              Whiskeyglass glass rose
# Whiskeyglass            0    27   24
# glass                  27     0   18
# rose                   24    18    0

If your dist.mat looks like dist.mat1 above, but its class is matrix, then you need to select which Names belong where.

row.names(dist.mat) <- Names[-1]             # removing the first name for rows
colnames(dist.mat) <- Names[-length(Names)]  # removing the last name for columns
like image 34
nya Avatar answered Dec 01 '25 05:12

nya



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!