Point-cloud data embedding

We provide a case study, using the MNIST dataset, to get you started with SG-t-SNE-Π. We assume you have Julia and SGtSNEpi installed already.

Prerequisites

You need to install the following packages for this demo

using Pkg
Pkg.add(["MLDatasets", "ImageFeatures", "Random", "Images"])

Embedding handwritten digits (MNIST data)

The MNIST dataset comprises of $60{,}000$ training and $10{,}000$ testing images of handwritten digits. We shall embed the total of $70{,}000$ handwritten images.

First, we download the dataset

using SGtSNEpi, MLDatasets, ImageFeatures, Random, Images

X, L = MNIST(Float64, split=:train)[:];
X = cat( X, MNIST(Float64, split=:test).features ; dims = 3 );
L = cat( L, MNIST(Float64, split=:test).targets ; dims = 1 );

L = Int.( vec( L ) );  # make sure labels is an integer vector

n = size( X, 3 );

X = permutedims( X, [2, 1, 3] );

We visualize some of the digits that appear in the data set

mosaicview( Gray.(X[:,:,1:600]), ncol=30, rowmajor=true )

We transform the pixel values to Histogram of Oriented Gradients (HOG) descriptors

F = zeros( n, 324 );

for img = 1:n
  F[img,:] = create_descriptor( X[:,:,img], HOG(; cell_size = 7) )
end

We initialize (randomly) the coordinates in the 2D embedding space (this step is crucial for reproducible results)

Random.seed!(0);
Y0 = 0.01 * randn( n, 2 );

We use SG-t-SNE-Π to embed the data in a 2D space

A = pointcloud2graph( F )
Y = sgtsnepi(A; Y0 = Y0);

Visualization

To reproduce the next steps, we need to install the following packages

Pkg.add(["CairoMakie", "Colors", "Makie"])

If Makie was not installed when SGtSNEpi was loaded, you need to restart Julia and repeat the previous steps.

Visualization

We visualize the $70{,}000$ digits on the 2D embedding space, colored by their class. For this purpose, we use the routine vis_embedding.

show_embedding( Y, L; res = (2000, 2000) )