Upcoming Features in corrselect 2.1.0
Source:vignettes/corrselect_upcoming.Rmd
corrselect_upcoming.Rmd
Upcoming Features in corrselect
2.1.0
This vignette introduces features that will be available in version
2.1.0 of the corrselect
package. These enhancements aim to
provide more flexibility and alternative strategies for variable subset
selection.
Spectral Method (Prototype)
A new selection strategy based on spectral clustering is currently in development. This approach performs a normalized spectral clustering on the correlation matrix to identify sets of weakly correlated variables.
Rationale
Unlike local or exhaustive search algorithms, spectral clustering provides a global approximation that can rapidly identify candidate subsets with minimal internal association.
Overview of Steps
The algorithm follows these steps:
- Similarity matrix from absolute correlations:
- Degree vector:
- Normalized Laplacian:
- Eigen decomposition of
- K-means clustering in the reduced eigenvector space
- Validation of each cluster based on correlation threshold and forced variables
Customizing the Number of Clusters
You can pass an integer k
to override the default number
of clusters:
res <- MatSelect(cmat, threshold = 0.5, method = "spectral", k = 4)
Note that this method is still under testing and might change before release.
Availability
This feature will be available in version 2.1.0. If you’re interested in testing it early, you can install the development version from GitHub:
# install.packages("devtools")
devtools::install_github("gcol33/corrselect")
I welcome feedback and suggestions via GitHub issues or direct contact.