cassiopy.mixture.SkewTUniformMixture#
The SkewTUniformMixture class models a mixture of skew-t distributions with a uniform background. It provides various methods for working with skew-t distributed data, including generating samples and calculating densities.
- class cassiopy.mixture.SkewTUniformMixture(n_cluster: int, init_method='gmm', parametre=None, n_init_gmm=6)[source]#
Bases:
object
A mixture model for clustering using the skewed Student’s t-distribution.
- Parameters:
- n_clusterint
Number of clusters.
- init_methodstr, default=’gmm’
Initialization method. Options: ‘gmm’, ‘kmeans’, ‘params’.
- parametredict, optional
Dictionary of initial parameters if init_method=’params’.
- n_init_gmmint, default=6
Number of initializations for GMM.
Notes
For more information, refer to the documentation Skew-t Uniform Mixture Models
Examples
>>> import numpy as np >>> from cassiopy.mixture import SkewTUniformMixture >>> X = np.array([[5, 3], [5, 7], [5, 1], [20, 3], [20, 7], [20, 1]]) >>> model = SkewTUniformMixture(n_cluster=2, init_method='kmeans') >>> model.fit(X, max_iter=100, tol=1e-4) >>> model.mean array([[ 5. , 1.40735413], [20.00000058, 0.66644041]]) >>> model.predict_proba(np.array([[0, 0], [22, 5]])) array([[0.5, 0.5], [0.5, 0.5]]) >>> model.save('model.h5') >>> model.load('Models_folder/model.h5') >>> model.predict_cluster(np.array([[0, 0], [22, 5]])) array([0., 0.])
- Attributes:
- meanndarray
Cluster means.
- sigmandarray
Cluster standard deviations.
- dlndarray
Degrees of freedom for each cluster.
- lambndarray
Skewness parameters for each cluster.
- alphandarray
Cluster weights.
- tikndarray
Posterior probabilities.
- loglilist
Log-likelihood values during training.
Methods
ARI(y_true, y_pred)Compute the ARI .
BIC(X)Calculate the Bayesian Information Criterion (BIC) for the model.
HARTIGAN(X)Calculate the Hartigan's index for the model.
IUS(X, y_pred[, bins])Calculate the Indice d'Uniformité de Shannon (IUS) for the model.
KL(X[, bins])Calculate the Kullback-Leibler divergence for the model.
L2(X[, bins, penalty_weight])Calculate the L2 distance between the empirical distribution of data points in the uniform cluster and a uniform distribution.
ST(X, mu, s, nu, la)Compute the posterior probabilities for the SkewT mixture model.
chi2(X[, bins])Calculate the Chi-squared statistic for the model.
confusion_matrix(y_true, y_pred)Calculate the confusion matrix.
fit(X[, tol, max_iter, init_method, ...])Fit the SkewT mixture model to the data.
Initialize the parameters for the Gaussian Mixture Model (GMM).
Initializes the parameters for the SkewMM algorithm using the K-means initialization method.
Initialize the parameters randomly for the SkewMM algorithm.
load(filename)Load matrices from a given file.
predict(X)Predict the cluster labels for the data.
Predict the posterior probabilities for the data.
save(filename)Save the model to a file.
AIC
logt_expr
- ARI(y_true, y_pred)[source]#
Compute the ARI .
- Parameters:
- - y (array-like): The true labels.
- Returns:
- ari (float): The Adjusted Rand Index (ARI) score.
- BIC(X)[source]#
Calculate the Bayesian Information Criterion (BIC) for the model.
- Parameters:
- - X (array-like): The input data.
- Returns:
- bic (float): The BIC value.
- HARTIGAN(X)[source]#
Calculate the Hartigan’s index for the model. This index measures the compactness of clusters by summing the squared distances of points from their cluster centers. :param X: The input data. :type X: array-like, shape (n_samples, n_features)
- Returns:
- Wfloat
The Hartigan’s index value.
- IUS(X, y_pred, bins=3)[source]#
Calculate the Indice d’Uniformité de Shannon (IUS) for the model. This index measures the uniformity of the distribution of data points across clusters.
- Parameters:
- datandarray of shape (n_samples, n_features)
- binsint, default=3
Number of bins for the histogram.
- Returns:
- iusfloat
The Indice d’Uniformité de Shannon (IUS) value.
- KL(X, bins=3)[source]#
Calculate the Kullback-Leibler divergence for the model. This index measures the divergence between the empirical distribution of data points in the uniform cluster and a uniform distribution.
- Parameters:
- datandarray of shape (n_samples, n_features)
- binsint, default=3
Number of bins for the histogram.
- Returns:
- kl_divfloat
The Kullback-Leibler divergence value.
- L2(X, bins=10, penalty_weight=0.1)[source]#
Calculate the L2 distance between the empirical distribution of data points in the uniform cluster and a uniform distribution. This index measures the distance between the empirical distribution and a uniform distribution.
- Parameters:
- datandarray of shape (n_samples, n_features)
- binsint, default=10
Number of bins for the histogram.
- penalty_weightfloat, default=0.1
Weight for the penalty term based on cluster sizes.
- Returns:
- l2_distancefloat
The L2 distance value.
- ST(X, mu, s, nu, la)[source]#
Compute the posterior probabilities for the SkewT mixture model.
- Parameters:
- Xndarray
Input data.
- wndarray
Cluster weights.
- mundarray
Cluster means.
- sndarray
Cluster standard deviations.
- nundarray
Degrees of freedom for each cluster.
- landarray
Skewness parameters for each cluster.
- Returns:
- Zndarray
Posterior probabilities for each cluster.
- chi2(X, bins=3)[source]#
Calculate the Chi-squared statistic for the model. This index measures the goodness of fit between the empirical distribution of data points in the uniform cluster and a uniform distribution.
- Parameters:
- datandarray of shape (n_samples, n_features)
- binsint, default=3
Number of bins for the histogram.
- Returns:
- chi2_statfloat
The Chi-squared statistic value.
- p_valuefloat
The p-value associated with the Chi-squared statistic.
- confusion_matrix(y_true, y_pred)[source]#
Calculate the confusion matrix.
- Parameters:
- y_truearray-like
The true labels.
- y_predarray-like, default=None
The predicted labels.
- Returns:
- matrixarray-like
The confusion matrix. The last cluster correspond to the uniform cluster.
- fit(X, tol=1e-06, max_iter=200, init_method='gmm', parametre=None, verbose=0)[source]#
Fit the SkewT mixture model to the data.
- Parameters:
- Xarray-like, shape (n_samples, n_features)
The input data.
- tolfloat, default=1e-6
Tolerance for convergence.
- max_iterint, default=200
Maximum number of iterations for the EM algorithm.
- initialisation_gmm(X)[source]#
Initialize the parameters for the Gaussian Mixture Model (GMM).
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input data matrix.
- Returns:
- wndarray of shape (n_clusters,)
Cluster weights.
- mundarray of shape (n_clusters, n_features)
Cluster means.
- sndarray of shape (n_clusters, n_features)
Cluster standard deviations.
- nundarray of shape (n_clusters, n_features)
Degrees of freedom for each cluster.
- landarray of shape (n_clusters, n_features)
Skewness parameters for each cluster.
- initialisation_kmeans(X)[source]#
Initializes the parameters for the SkewMM algorithm using the K-means initialization method.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
The input data matrix.
- default_n_initint, default=’auto’
The number of times the K-means algorithm will be run with different centroid seeds. Default is ‘auto’.
- Returns:
- dictdict
A dictionary containing the initialized parameters:
- wndarray of shape (n_clusters,)
Cluster weights.
- mundarray of shape (n_clusters, n_features)
Cluster means.
- sndarray of shape (n_clusters, n_features)
Cluster standard deviations.
- nundarray of shape (n_clusters, n_features)
Degrees of freedom for each cluster.
- landarray of shape (n_clusters, n_features)
Skewness parameters for each cluster.
- initialisation_random(X)[source]#
Initialize the parameters randomly for the SkewMM algorithm.
- Parameters:
- Xarray-like of shape (n_samples, n_features)
Input data matrix.
- Returns:
- wndarray of shape (n_clusters,)
Cluster weights.
- mundarray of shape (n_clusters, n_features)
Cluster means.
- sndarray of shape (n_clusters, n_features)
Cluster standard deviations.
- nundarray of shape (n_clusters, n_features)
Degrees of freedom for each cluster.
- landarray of shape (n_clusters, n_features)
Skewness parameters for each cluster.
- load(filename: str)[source]#
Load matrices from a given file.
- Parameters:
- filenamestr
The path to the file containing the matrices.
- predict(X)[source]#
Predict the cluster labels for the data.
- Parameters:
- - X (array-like): The input data.
- Returns:
- labels (array-like): The predicted cluster labels.