Methods for Clustering Mixed-Type Data
Released Mar 16, 2019 by Alexander Foss
Implements methods for clustering mixed-type data, specifically combinations of continuous and nominal data. Special attention is paid to the often-overlooked problem of equitably balancing the contribution of the continuous and categorical variables. This package implements KAMILA clustering, a novel method for clustering mixed-type data in the spirit of k-means clustering. It does not require dummy coding of variables, and is efficient enough to scale to rather large data sets. Also implemented is Modha-Spangler clustering, which uses a brute-force strategy to maximize the cluster separation simultaneously in the continuous and categorical variables. For more information, see Foss, Markatou, Ray, & Heching (2016)
This package can be included as a dependency from a Java or Scala project by including
the following your project's
about embedding Renjin in JVM-based projects.
<dependencies> <dependency> <groupId>org.renjin.cran</groupId> <artifactId>kamila</artifactId> <version>0.1.1.3-b1</version> </dependency> </dependencies> <repositories> <repository> <id>bedatadriven</id> <name>bedatadriven public repo</name> <url>https://nexus.bedatadriven.com/content/groups/public/</url> </repository> </repositories>
If you're using Renjin from the command line, you load this library by invoking:
This package was last tested against Renjin 0.9.2725 on Mar 26, 2019.