Any workaround to clustering mixed data types and render 3D scatter plot in R?

Tags:

I am trying to see data points distribution within labeled groups in the 3D plot, because I want to see the distribution of the data points and want to see how similar each group of data points in 3D space. To do so, I used scatterplot3d package from CRAN to get 3D to scatter plot, didn't get the correct plot for my data.

reproducible data

Here is the reproducible data that I used.

    > dput(head(phenDat,30))
structure(list(SampleID = c("Tarca_001_P1A01", "Tarca_013_P1B01", 
"Tarca_025_P1C01", "Tarca_037_P1D01", "Tarca_049_P1E01", "Tarca_061_P1F01", 
"Tarca_051_P1E03", "Tarca_063_P1F03", "Tarca_075_P1G03", "Tarca_087_P1H03", 
"Tarca_004_P1A04", "Tarca_064_P1F04", "Tarca_076_P1G04", "Tarca_088_P1H04", 
"Tarca_005_P1A05", "Tarca_017_P1B05", "Tarca_054_P1E06", "Tarca_066_P1F06", 
"Tarca_078_P1G06", "Tarca_090_P1H06", "Tarca_007_P1A07", "Tarca_019_P1B07", 
"Tarca_031_P1C07", "Tarca_079_P1G07", "Tarca_091_P1H07", "Tarca_008_P1A08", 
"Tarca_020_P1B08", "Tarca_022_P1B10", "Tarca_034_P1C10", "Tarca_046_P1D10"
), GA = c(11, 15.3, 21.7, 26.7, 31.3, 32.1, 19.7, 23.6, 27.6, 
30.6, 32.6, 12.6, 18.6, 25.6, 30.6, 36.4, 24.9, 28.9, 36.6, 19.9, 
26.1, 30.1, 36.7, 13.6, 17.6, 22.6, 24.7, 13.3, 19.7, 24.7), 
    Batch = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 
    6L, 6L, 6L), Set = c("PRB_HTA", "PRB_HTA", "PRB_HTA", "PRB_HTA", 
    "PRB_HTA", "PRB_HTA", "PRB_HTA", "PRB_HTA", "PRB_HTA", "PRB_HTA", 
    "PRB_HTA", "PRB_HTA", "PRB_HTA", "PRB_HTA", "PRB_HTA", "PRB_HTA", 
    "PRB_HTA", "PRB_HTA", "PRB_HTA", "PRB_HTA", "PRB_HTA", "PRB_HTA", 
    "PRB_HTA", "PRB_HTA", "PRB_HTA", "PRB_HTA", "PRB_HTA", "PRB_HTA", 
    "PRB_HTA", "PRB_HTA"), Train = c(1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Platform = c("HTA20", 
    "HTA20", "HTA20", "HTA20", "HTA20", "HTA20", "HTA20", "HTA20", 
    "HTA20", "HTA20", "HTA20", "HTA20", "HTA20", "HTA20", "HTA20", 
    "HTA20", "HTA20", "HTA20", "HTA20", "HTA20", "GSE113966", "GSE113966", 
    "GSE113966", "GSE113966", "GSE113966", "GSE113966", "GSE113966", "GSE113966", "GSE113966", 
    "GSE113966")), row.names = c(NA, 30L), class = "data.frame")

my attempt:

hclustfunc <- function(x) hclust(x, method="complete")
distfunc <- function(x) as.dist((1-cor(t(x)))/2)
d <- distfunc(persons_df)
fit <- hclustfunc(d)

my updated attempt:

library(rgl)
library(car)
scatter3d(x = PC1, y = PC2, z = PC3, surface = FALSE, groups = as.factor(clusters),  surface.col = cluster.colors, col = cluster.colors, xlab="PC1",ylab="PC2",zlab="PC3")

basically, I want to see data points (a.k.a, rows) that belong to different batch (or group), wanted to color them by some 'group' attribute. I just want to see how data points are similar to each other if we grouped them by different age categories, different batch, and different platform

I am thinking to use kmeans, PCA, other methods can give me different components that can be visualized in 3D plot, but this is not very intuitive to me how to do it in R?

desired plot:

I want to get 3D plot something like this:

enter image description here

can anyone point me out how can I possibly to make this happen? any way to get cluster my data and visualize it in 3D plot in R? Any thoughts? Thanks

update: simplest things might be possible:

I don't want to get too complicated solution in the first place, I just want to group data points (a.k.a, each rows) that belongs to different batch, platform, and age categories (I used findInterval(persons_df$ages, c(10,20,30,40,50))). Any way to make this happen in R?

892

asked Jul 11 '19 21:07

Jerry07

1 Answers

Edit - added k-modes approach for mixed data clustering.

You might also consider plotly for 3d plotting. Here's an example with your data, where I've defined groups for every existing combination of Batch, Platform, and 10 year age bucket. In plotly these are assigned different colors, and you can double-click the group legends to toggle appearance. You'd need to modify for much bigger data, for instance you could remove Platform from the grouping since that's already mapped to z.

library(plotly); library(dplyr); library(RColorBrewer)
age_group = 10
phenDat %>% 
  mutate(group = paste(Batch, Platform, "age", 
                       floor(GA/age_group)*age_group, "-", 
                       floor(GA/age_group)*age_group + age_group - 1)) %>%
plot_ly(x = ~GA, y = ~Set, z = ~Platform, color = ~group) %>%
  add_markers(marker = list(size = 2,
                            color = colorRampPalette(brewer.pal(11,"Spectral"))(10))) %>%
  layout(scene = list(xaxis = list(title = "GA"),
                      yaxis = list(title = "Set"),
                      zaxis = list(title = "Platform")))

enter image description here

As for clustering given the mixed data, here's an approach using the klaR package's kmodes function, which seems to create plausible results here:

phenDat %>%
  bind_cols(cluster = klaR::kmodes(phenDat, 6)[["cluster"]] %>% as.character) %>%
  plot_ly(x = ~GA, y = ~Set, z = ~Platform, color = ~cluster) %>%
  add_markers(marker = list(size = 5)) %>%
  layout(scene = list(xaxis = list(title = "GA"),
                      yaxis = list(title = "Set"),
                      zaxis = list(title = "Platform")))

enter image description here

answered Oct 16 '22 07:10

Jon Spring

Related questions
                            
                                Where have I introduced extra monadic structure in my implementation of the EitherT catamorphism?
                            
                                why HttpEventType.UploadProgress event executeonly one time in Angular file upload request?
                            
                                How to run ECMAScript (mjs files) with nodemon?
                            
                                Why do we have to call func didMove(toParent parent: UIViewController?) after calling addChild(UIViewController)?
                            
                                PHP DateTime->diff() doesn't work correctly
                            
                                How to trigger a state update without triggering it’s useEffect
                            
                                Auto-add PackageReferences to NuGet package
                            
                                R strsplit using Regex
                            
                                Is it possible to access native cell-phone or device APIs using Blazor to access camera, contacts etc?
                            
                                Bring to front the panel grid
                            
                                Can't install R 3.6 in Raspberry pi 3 B in raspbian stretch
                            
                                SwiftUI: How to Properly Code AVPlayer After Loading A Video From the Device with an ImagePickerController?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Any workaround to clustering mixed data types and render 3D scatter plot in R?

Tags:

r

dplyr

ggplot2

cluster-analysis

Jerry07

People also ask

1 Answers

Jon Spring

Recent Activity

Donate For Us