API

SGtSNEpi.jl exports the following functions

SGtSNEpi.neighbor_recall
SGtSNEpi.pointcloud2graph
SGtSNEpi.sgtsnepi
SGtSNEpi.show_embedding

SGtSNEpi.pointcloud2graph — Function

pointcloud2graph( X::AbstractMatrix, u = 10, k = 3*u; knn_type, rescale_type = :perplexity )

Convert a point-cloud data set $X$ (coordinates) of size $N \times D$ to a similarity graph, using perplexity equalization, same as conventional t-SNE.

Special options for point-cloud data embedding

u=10: either perplexity or value of λ (if rescale_type` is set to :lambda)
k=3*u: number of nearest neighbors (for kNN formation)
knn_type=( size(A,1) < 10_000 ) || !USING_FLANN ? :exact : :flann: Exact or approximate kNN
rescale_type=:perplexity: Which rescaling operation to use to transform distances into edge weights. Either:perplexityor:lambda. Default is:perlexity`.

source

SGtSNEpi.sgtsnepi — Method

sgtsnepi( A::AbstractMatrix )
sgtsnepi( G::AbstractGraph )

Call SG-t-SNE-Π on the input graph, given as either a sparse adjacency matrix $A$ or a graph object $G$. Alternatively, the input can be a point-cloud data set $X$ (coordinates) of size $N \times D$, i.e.,

sgtsnepi( X::AbstractMatrix )

Optional arguments

d=2: number of dimensions (embedding space)
λ=10: SG-t-SNE scaling factor
version=SGtSNEpi.NUCONV_BL: the version of the algorithm for computing repulsive terms. Options are
- SGtSNEpi.NUCONV_BL (default): band-limited, approximated via non-uniform convolution
- SGtSNEpi.NUCONV: approximated via non-uniform convolution (higher resolution than SGtSNEpi.NUCONV_BL, slower execution time)
- SGtSNEpi.EXACT: no approximation; quadratic complexity, use only with small datasets

More options (for experts)

max_iter=1000: number of iterations
early_exag=250: number of early exageration iterations
alpha=12: exaggeration strength (applicable for first early_exag iterations)
Y0=nothing: initial distribution in embedding space (randomly generated if nothing). You should set this parameter to generate reproducible results.
eta=200.0: learning parameter
drop_leaf=false: remove edges connecting to leaf nodes
flag_unweighted_to_weighted=true: if true, convert unweighted graph to weighted graph by computing local neighborhood weights

Advanced options (performance-related)

np=0: number of threads (set to 0 to use all available cores)
h=1.0: grid side length
list_grid_size = filter( x -> x == nextprod( (2, 3, 5), x ), 16:512 ): the list of allowed grid size along each dimension. Affects FFT performance; most efficient if the size is a product of small primes.
profile=false: disable/enable profiling. If enabled the function return a 3-tuple: (Y, t, g), where Y is the embedding coordinates, t are the execution times of each module per iteration (size 6 x max_iter) and g contains the grid size, the embedding domain size (maximum(Y) - minimum(Y)), and the scaling factor s_k for the band-limited version, per dimension (size 3 x max_iter).

Notes

Isolated nodes are placed randomly on the top-right corner of the embedding space
The function tries to automatically detect whether the input matrix represents an adjacency matrix or data coordinates. In ambiquous cases, such as a square matrix of data coordinates, the user may specify the type using the optional argument type
- :graph: the input is an adjacency matrix
- :coord: the input is the data coordinates

Examples

julia> using Graphs

julia> G = circular_ladder_graph( 500 )
{1000, 1500} undirected simple Int64 graph

julia> Y = sgtsnepi( G; np = 4, early_exag = 100, max_iter = 250 );
Number of vertices: 1000
Embedding dimensions: 2
Rescaling parameter λ: 10
Early exag. multiplier α: 12
Maximum iterations: 250
Early exag. iterations: 100
Learning rate: 200
Box side length h: 1
Drop edges originating from leaf nodes? 0
Number of processes: 4
1000 out of 1000 nodes already stochastic
m = 1000 | n = 1000 | nnz = 3000
WARNING: Randomizing initial points; non-reproducible results
Setting-up parallel (double-precision) FFTW: 4
Iteration 1: error is 96.9204
Iteration 50: error is 84.9181 (50 iterations in 0.039296 seconds)
Iteration 100: error is 4.32754 (50 iterations in 0.038005 seconds)
Iteration 150: error is 2.54655 (50 iterations in 0.066491 seconds)
Iteration 200: error is 1.90124 (50 iterations in 0.159556 seconds)
Iteration 249: error is 1.65057 (50 iterations in 0.213149 seconds)
 --- Time spent in each module ---

 Attractive forces: 0.006199 sec [1.24082%] |  Repulsive forces: 0.49339 sec [98.7592%]

source

SGtSNEpi.neighbor_recall — Method

neighbor_recall(X, Y[; k = 10])

Histogram of the stochastic k-neighbors recall values at all vertices.

\[{\rm recall}(v) = \sum_j \mathbf{P}_{j|i} b_{ij},\]

where $\mathbf{B}_k = [b_{ij}]$ is the adjacency matrix of the kNN graph of the embedded points $\mathbf{Y}$ in Euclidean distance, and $\mathbf{P}_{j|i}$ is the neighborhood probability matrix in the high-dimensional space $\mathbf{X}$.

See Fig.4 in https://arxiv.org/pdf/1906.05582.pdf for more details.

source

SGtSNEpi.show_embedding — Function

show_embedding( Y [, L] )

Visualization of 2D embedding coordinates $Y$ using Makie. If the $n \times 1$ vector $L$ of vertex memberships is provided, points are colored accroding to the labels.

Requirements

This function is provided only if the user already has Makie installed in their Julia environment. Otherwise, the function will not be defined.

Notes

You need to install and select a backend for Makie. See instructions by Makie for more details. For example, to show figures in an interactive window, issue

] add GLMakie
using GLMakie
GLMakie.activate!()

Optional arguments (for experts)

A=nothing: adjacency matrix; if provided, edges are shown
cmap: the colormap to use (default to distinguishable colors)
res=(800,800): figure resolution
lwd_in=0.5: line width for internal edges
lwd_out=0.3: line width for external edges
edge_alpha=0.2: the alpha channel for the edges
clr_in=nothing: set color for all intra-cluster edges (if nothing, color by cmap)
clr_out=colorant"#aabbbbbb": the color of inter-cluster edges
mrk_size=4: marker size
size_label:24: legend label size

source