API
SGtSNEpi.jl exports the following functions
SGtSNEpi.pointcloud2graph
— Functionpointcloud2graph( X::AbstractMatrix, u = 10, k = 3*u; knn_type, rescale_type = :perplexity )
Convert a point-cloud data set $X$ (coordinates) of size $N \times D$ to a similarity graph, using perplexity equalization, same as conventional t-SNE.
Special options for point-cloud data embedding
u=10
: either perplexity or value of λ (ifrescale_type
` is set to :lambda)k=3*u
: number of nearest neighbors (for kNN formation)knn_type=( size(A,1) < 10_000 ) || !USING_FLANN ? :exact : :flann
: Exact or approximate kNNrescale_type=:perplexity: Which rescaling operation to use to transform distances into edge weights. Either
:perplexityor
:lambda. Default is
:perlexity`.
SGtSNEpi.sgtsnepi
— Methodsgtsnepi( A::AbstractMatrix )
sgtsnepi( G::AbstractGraph )
Call SG-t-SNE-Π on the input graph, given as either a sparse adjacency matrix $A$ or a graph object $G$. Alternatively, the input can be a point-cloud data set $X$ (coordinates) of size $N \times D$, i.e.,
sgtsnepi( X::AbstractMatrix )
Optional arguments
d=2
: number of dimensions (embedding space)λ=10
: SG-t-SNE scaling factorversion=SGtSNEpi.NUCONV_BL
: the version of the algorithm for computing repulsive terms. Options areSGtSNEpi.NUCONV_BL
(default): band-limited, approximated via non-uniform convolutionSGtSNEpi.NUCONV
: approximated via non-uniform convolution (higher resolution thanSGtSNEpi.NUCONV_BL
, slower execution time)SGtSNEpi.EXACT
: no approximation; quadratic complexity, use only with small datasets
More options (for experts)
max_iter=1000
: number of iterationsearly_exag=250
: number of early exageration iterationsalpha=12
: exaggeration strength (applicable for firstearly_exag
iterations)Y0=nothing
: initial distribution in embedding space (randomly generated ifnothing
). You should set this parameter to generate reproducible results.eta=200.0
: learning parameterdrop_leaf=false
: remove edges connecting to leaf nodesflag_unweighted_to_weighted=true
: iftrue
, convert unweighted graph to weighted graph by computing local neighborhood weights
Advanced options (performance-related)
np=0
: number of threads (set to 0 to use all available cores)h=1.0
: grid side lengthlist_grid_size = filter( x -> x == nextprod( (2, 3, 5), x ), 16:512 )
: the list of allowed grid size along each dimension. Affects FFT performance; most efficient if the size is a product of small primes.profile=false
: disable/enable profiling. If enabled the function return a 3-tuple:(Y, t, g)
, whereY
is the embedding coordinates,t
are the execution times of each module per iteration (size6 x max_iter
) andg
contains the grid size, the embedding domain size (maximum(Y) - minimum(Y)
), and the scaling factors_k
for the band-limited version, per dimension (size3 x max_iter
).
Notes
Isolated nodes are placed randomly on the top-right corner of the embedding space
The function tries to automatically detect whether the input matrix represents an adjacency matrix or data coordinates. In ambiquous cases, such as a square matrix of data coordinates, the user may specify the type using the optional argument
type
:graph
: the input is an adjacency matrix:coord
: the input is the data coordinates
Examples
julia> using Graphs
julia> G = circular_ladder_graph( 500 )
{1000, 1500} undirected simple Int64 graph
julia> Y = sgtsnepi( G; np = 4, early_exag = 100, max_iter = 250 );
Number of vertices: 1000
Embedding dimensions: 2
Rescaling parameter λ: 10
Early exag. multiplier α: 12
Maximum iterations: 250
Early exag. iterations: 100
Learning rate: 200
Box side length h: 1
Drop edges originating from leaf nodes? 0
Number of processes: 4
1000 out of 1000 nodes already stochastic
m = 1000 | n = 1000 | nnz = 3000
WARNING: Randomizing initial points; non-reproducible results
Setting-up parallel (double-precision) FFTW: 4
Iteration 1: error is 96.9204
Iteration 50: error is 84.9181 (50 iterations in 0.039296 seconds)
Iteration 100: error is 4.32754 (50 iterations in 0.038005 seconds)
Iteration 150: error is 2.54655 (50 iterations in 0.066491 seconds)
Iteration 200: error is 1.90124 (50 iterations in 0.159556 seconds)
Iteration 249: error is 1.65057 (50 iterations in 0.213149 seconds)
--- Time spent in each module ---
Attractive forces: 0.006199 sec [1.24082%] | Repulsive forces: 0.49339 sec [98.7592%]
SGtSNEpi.neighbor_recall
— Methodneighbor_recall(X, Y[; k = 10])
Histogram of the stochastic k-neighbors recall values at all vertices.
\[{\rm recall}(v) = \sum_j \mathbf{P}_{j|i} b_{ij},\]
where $\mathbf{B}_k = [b_{ij}]$ is the adjacency matrix of the kNN graph of the embedded points $\mathbf{Y}$ in Euclidean distance, and $\mathbf{P}_{j|i}$ is the neighborhood probability matrix in the high-dimensional space $\mathbf{X}$.
See Fig.4 in https://arxiv.org/pdf/1906.05582.pdf for more details.
SGtSNEpi.show_embedding
— Functionshow_embedding( Y [, L] )
Visualization of 2D embedding coordinates $Y$ using Makie. If the $n \times 1$ vector $L$ of vertex memberships is provided, points are colored accroding to the labels.
Requirements
This function is provided only if the user already has Makie installed in their Julia environment. Otherwise, the function will not be defined.
Notes
You need to install and select a backend for Makie. See instructions by Makie for more details. For example, to show figures in an interactive window, issue
] add GLMakie
using GLMakie
GLMakie.activate!()
Optional arguments (for experts)
A=nothing
: adjacency matrix; if provided, edges are showncmap
: the colormap to use (default to distinguishable colors)res=(800,800)
: figure resolutionlwd_in=0.5
: line width for internal edgeslwd_out=0.3
: line width for external edgesedge_alpha=0.2
: the alpha channel for the edgesclr_in=nothing
: set color for all intra-cluster edges (if nothing, color bycmap
)clr_out=colorant"#aabbbbbb"
: the color of inter-cluster edgesmrk_size=4
: marker sizesize_label:24
: legend label size