I am doing a community website that requires me to calculate the similarity between any two users. Each user is described with the following attributes: age, skin type (oily, dry), hair type (long, short, medium), lifestyle (active outdoor lover, TV junky) and others. Can anyone tell me how to go about this problem or point me to some resources?

Another way of computing (in R) all the pairwise dissimilarities (distances) between observations in the data set. The original variables may be of mixed types. The handling of nominal, ordinal, and (a)symmetric binary data is achieved by using the general dissimilarity coefficient of Gower (Gower, J. C. (1971) A general coefficient of similarity and some of its properties, Biometrics 27, 857–874). For more check out this on page 47. If x contains any columns of these data-types, Gower's coefficient will be used as the metric. For example <pre class="prettyprint"><code>x1 <- factor(c(10, 12, 25, 14, 29)) x2 <- factor(c("oily", "dry", "dry", "dry", "oily")) x3 <- factor(c("medium", "short", "medium", "medium", "long")) x4 <- factor(c("active outdoor lover", "TV junky", "TV junky", "active outdoor lover", "TV junky")) x <- cbind(x1,x2,x3,x4) library(cluster) daisy(x, metric = "euclidean") </code></pre> you'll get : <pre class="prettyprint"><code>Dissimilarities : 1 2 3 4 2 2.000000 3 3.316625 2.236068 4 2.236068 1.732051 1.414214 5 4.242641 3.741657 1.732051 2.645751 </code></pre> If you are interested on a method for dimensionality reduction for categorical data (also a way to arrange variables into homogeneous clusters) check this

Ways to calculate similarity

1 Answers

Another way of computing (in R) all the pairwise dissimilarities (distances) between observations in the data set. The original variables may be of mixed types. The handling of nominal, ordinal, and (a)symmetric binary data is achieved by using the general dissimilarity coefficient of Gower (Gower, J. C. (1971) A general coefficient of similarity and some of its properties, Biometrics 27, 857–874). For more check out this on page 47. If x contains any columns of these data-types, Gower's coefficient will be used as the metric.

For example

x1 <- factor(c(10, 12, 25, 14, 29))
x2 <- factor(c("oily", "dry", "dry", "dry", "oily"))
x3 <- factor(c("medium", "short", "medium", "medium", "long"))
x4 <- factor(c("active outdoor lover", "TV junky", "TV junky", "active outdoor lover", "TV junky"))
x <- cbind(x1,x2,x3,x4)

library(cluster)
daisy(x, metric = "euclidean")

you'll get :

Dissimilarities :
         1        2        3        4
2 2.000000                           
3 3.316625 2.236068                  
4 2.236068 1.732051 1.414214         
5 4.242641 3.741657 1.732051 2.645751

If you are interested on a method for dimensionality reduction for categorical data (also a way to arrange variables into homogeneous clusters) check this

186

answered Nov 07 '22 18:11

George Dontas

Related questions
                            
                                Why does not R round function round big numbers
                            
                                Flier colors in boxplot with matplotlib
                            
                                Loop over string variables in R
                            
                                Computing median in map reduce
                            
                                How to sum leading diagonal of table in R
                            
                                Realistic time estimates for progress bars etc
                            
                                Convert igraph object to a data frame in R
                            
                                Fisher test error : LDSTP is too small
                            
                                Git log --stat summary of branch
                            
                                Is it possible with ggvis to interactively change the variables for the x and y axes?
                            
                                How to calculate a partial Area Under the Curve (AUC)
                            
                                How to do Unit Testing with Uncertainties?
                            
                                Error Function Erf(z)
                            
                                How to fill NA with median?
                            
                                Nth Combination
                            
                                Exact number of bins in Histogram in R
                            
                                Eigen: Is there an inbuilt way to calculate sample covariance
                            
                                How to fit a random effects model with Subject as random in R?
                            
                                GSL statistics, what is stride?
                            
                                What is the best open source solution for storing time series data? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Ways to calculate similarity

Tags:

statistics

similarity

data-mining

social-networking

pattern-recognition

MarySheen

People also ask

1 Answers

George Dontas

Recent Activity

Donate For Us