API

SGtSNEpi.jl exports the following functions

SGtSNEpi.pointcloud2graphFunction
pointcloud2graph( X::AbstractMatrix, u = 10, k = 3*u; knn_type, rescale_type = :perplexity )

Convert a point-cloud data set $X$ (coordinates) of size $N \times D$ to a similarity graph, using perplexity equalization, same as conventional t-SNE.

Special options for point-cloud data embedding

  • u=10: either perplexity or value of λ (if rescale_type` is set to :lambda)
  • k=3*u: number of nearest neighbors (for kNN formation)
  • knn_type=( size(A,1) < 10_000 ) || !USING_FLANN ? :exact : :flann: Exact or approximate kNN
  • rescale_type=:perplexity: Which rescaling operation to use to transform distances into edge weights. Either:perplexityor:lambda. Default is:perlexity`.
source
SGtSNEpi.sgtsnepiMethod
sgtsnepi( A::AbstractMatrix )
sgtsnepi( G::AbstractGraph )

Call SG-t-SNE-Π on the input graph, given as either a sparse adjacency matrix $A$ or a graph object $G$. Alternatively, the input can be a point-cloud data set $X$ (coordinates) of size $N \times D$, i.e.,

sgtsnepi( X::AbstractMatrix )

Optional arguments

  • d=2: number of dimensions (embedding space)
  • λ=10: SG-t-SNE scaling factor
  • version=SGtSNEpi.NUCONV_BL: the version of the algorithm for computing repulsive terms. Options are
    • SGtSNEpi.NUCONV_BL (default): band-limited, approximated via non-uniform convolution
    • SGtSNEpi.NUCONV: approximated via non-uniform convolution (higher resolution than SGtSNEpi.NUCONV_BL, slower execution time)
    • SGtSNEpi.EXACT: no approximation; quadratic complexity, use only with small datasets

More options (for experts)

  • max_iter=1000: number of iterations
  • early_exag=250: number of early exageration iterations
  • alpha=12: exaggeration strength (applicable for first early_exag iterations)
  • Y0=nothing: initial distribution in embedding space (randomly generated if nothing). You should set this parameter to generate reproducible results.
  • eta=200.0: learning parameter
  • drop_leaf=false: remove edges connecting to leaf nodes
  • flag_unweighted_to_weighted=true: if true, convert unweighted graph to weighted graph by computing local neighborhood weights

Advanced options (performance-related)

  • np=0: number of threads (set to 0 to use all available cores)
  • h=1.0: grid side length
  • list_grid_size = filter( x -> x == nextprod( (2, 3, 5), x ), 16:512 ): the list of allowed grid size along each dimension. Affects FFT performance; most efficient if the size is a product of small primes.
  • profile=false: disable/enable profiling. If enabled the function return a 3-tuple: (Y, t, g), where Y is the embedding coordinates, t are the execution times of each module per iteration (size 6 x max_iter) and g contains the grid size, the embedding domain size (maximum(Y) - minimum(Y)), and the scaling factor s_k for the band-limited version, per dimension (size 3 x max_iter).

Notes

  • Isolated nodes are placed randomly on the top-right corner of the embedding space

  • The function tries to automatically detect whether the input matrix represents an adjacency matrix or data coordinates. In ambiquous cases, such as a square matrix of data coordinates, the user may specify the type using the optional argument type

    • :graph: the input is an adjacency matrix
    • :coord: the input is the data coordinates

Examples

julia> using Graphs

julia> G = circular_ladder_graph( 500 )
{1000, 1500} undirected simple Int64 graph

julia> Y = sgtsnepi( G; np = 4, early_exag = 100, max_iter = 250 );
Number of vertices: 1000
Embedding dimensions: 2
Rescaling parameter λ: 10
Early exag. multiplier α: 12
Maximum iterations: 250
Early exag. iterations: 100
Learning rate: 200
Box side length h: 1
Drop edges originating from leaf nodes? 0
Number of processes: 4
1000 out of 1000 nodes already stochastic
m = 1000 | n = 1000 | nnz = 3000
WARNING: Randomizing initial points; non-reproducible results
Setting-up parallel (double-precision) FFTW: 4
Iteration 1: error is 96.9204
Iteration 50: error is 84.9181 (50 iterations in 0.039296 seconds)
Iteration 100: error is 4.32754 (50 iterations in 0.038005 seconds)
Iteration 150: error is 2.54655 (50 iterations in 0.066491 seconds)
Iteration 200: error is 1.90124 (50 iterations in 0.159556 seconds)
Iteration 249: error is 1.65057 (50 iterations in 0.213149 seconds)
 --- Time spent in each module ---

 Attractive forces: 0.006199 sec [1.24082%] |  Repulsive forces: 0.49339 sec [98.7592%]
source
SGtSNEpi.neighbor_recallMethod
neighbor_recall(X, Y[; k = 10])

Histogram of the stochastic k-neighbors recall values at all vertices.

\[{\rm recall}(v) = \sum_j \mathbf{P}_{j|i} b_{ij},\]

where $\mathbf{B}_k = [b_{ij}]$ is the adjacency matrix of the kNN graph of the embedded points $\mathbf{Y}$ in Euclidean distance, and $\mathbf{P}_{j|i}$ is the neighborhood probability matrix in the high-dimensional space $\mathbf{X}$.

See Fig.4 in https://arxiv.org/pdf/1906.05582.pdf for more details.

source
SGtSNEpi.show_embeddingFunction
show_embedding( Y [, L] )

Visualization of 2D embedding coordinates $Y$ using Makie. If the $n \times 1$ vector $L$ of vertex memberships is provided, points are colored accroding to the labels.

Requirements

This function is provided only if the user already has Makie installed in their Julia environment. Otherwise, the function will not be defined.

Notes

You need to install and select a backend for Makie. See instructions by Makie for more details. For example, to show figures in an interactive window, issue

] add GLMakie
using GLMakie
GLMakie.activate!()

Optional arguments (for experts)

  • A=nothing: adjacency matrix; if provided, edges are shown
  • cmap: the colormap to use (default to distinguishable colors)
  • res=(800,800): figure resolution
  • lwd_in=0.5: line width for internal edges
  • lwd_out=0.3: line width for external edges
  • edge_alpha=0.2: the alpha channel for the edges
  • clr_in=nothing: set color for all intra-cluster edges (if nothing, color by cmap)
  • clr_out=colorant"#aabbbbbb": the color of inter-cluster edges
  • mrk_size=4: marker size
  • size_label:24: legend label size
source