Skip to content

Graphlet-based network properties

Setup

You need to install the following packages for this demo.

pip install pyfglt networkx pandas numpy

We’ll create three graphs as running examples in this tutorial.

import numpy as np
import pandas as pd
import networkx as nx

G1 = nx.barabasi_albert_graph(500, 7, seed = 0)
G2 = nx.barabasi_albert_graph(1000, 8, seed = 1)
G3 = nx.watts_strogatz_graph(750, 7, 0.02, seed = 0)

We then compute the graphlet frequencies for each network.

import pyfglt.fglt as fg

F1 = fg.compute(G1)
F2 = fg.compute(G2)
F3 = fg.compute(G3)

Relative graphlet frequency (RGF) distance

The Relative Graphlet Frequency of an orbit is the frequency of that orbit (summing across all vertices) divided by the sum of all orbit frequencies in the graph.

RGF Distance between two graphs \(G\) and \(H\) for orbits \(\{1,\ldots,16\}\) can be defined as:

\[ d_{\text{RGF}}(G,H) \;=\; \sum_{k=1}^{16} \left\lvert \frac{F_G(k)}{\sum_{j=1}^{16} F_G(j)} \;-\; \frac{F_H(k)}{\sum_{j=1}^{16} F_H(j)} \right\rvert \]

where \(F_G(k)\) is the sum of orbit \(k\) counts across all vertices in graph \(G\).

rgf_dist = np.zeros((3,3))
for i,Fa in enumerate( [F1, F2, F3] ):
    for j,Fb in enumerate( [F1, F2, F3] ):
        rgf_dist[i,j] = fg.compute_rgf_distance(Fa, Fb)
G1 G2 G3
G1 0 0.108194 0.8892
G2 0.108194 0 0.939122
G3 0.8892 0.939122 0

Graphlet degree distribution (GDD) agreement

The Graphlet Degree of a vertex for a specific orbit is the number of times that vertex touches (or participates in) that orbit.
Given the orbit counts in the DataFrame, each column is effectively the graphlet degree for each vertex across an orbit.

To measure GDD Agreement between two graphs, one approach (inspired by GraphCrunch2) is:

  1. For each orbit \(i\) (from 1 to 16):
    • Build the distribution (histogram) of the graphlet degrees \(d^G_i\) across all vertices in graph \(G\).
    • Same for graph \(H\).
  2. Compare these two distributions using an overlap measure. For discrete distributions \(P_i^G\) and \(P_i^H\): $$ \text{Overlap}(P_i^G, P_i^H) = \sum_k \min\bigl(P_i^G(k), P_i^H(k)\bigr) $$ where \(P_i^G(k)\) is the fraction of vertices in \(G\) that have graphlet degree \(k\) for orbit \(i\).
  3. Average this overlap across all orbits: $$ \text{GDD-Agreement}(G,H) = \frac{1}{16} \sum_{i=1}^{16} \text{Overlap}(P_i^G, P_i^H). $$

This yields a value in \([0, 1]\), where 1 means perfect agreement of distributions.

gdd_agree = np.zeros((3,3))
for i,Fa in enumerate( [F1, F2, F3] ):
    for j,Fb in enumerate( [F1, F2, F3] ):
        gdd_agree[i,j] = fg.compute_gdd_agreement(Fa, Fb)
G1 G2 G3
G1 1 0.523313 0.123542
G2 0.523313 1 0.115167
G3 0.123542 0.115167 1

Graphlet Correlation Matrix (GCM)

The Graphlet Correlation Matrix (GCM) captures correlations (usually Spearman or Pearson) between different orbits across the vertices. Essentially:

\[ GCM(G) \quad=\quad \Bigl(\rho_{i,j}\Bigr)_{1 \le i,j \le 16} \]

where \(\rho_{i,j}\) is the correlation (e.g., Spearman) between the i-th orbit degrees and the j-th orbit degrees across all vertices in graph \(G\).

gcm1 = fg.compute_graphlet_correlation_matrix(F1, method='spearman')
gcm2 = fg.compute_graphlet_correlation_matrix(F2, method='spearman')
gcm3 = fg.compute_graphlet_correlation_matrix(F3, method='spearman')

GCM example

We now have the correlation matrix (GCM) for each graph. The GCM for graph G1 is shown below

[1] degree [2] 2-path [3] bifork [4] 3-cycle [5] 3-path, end [6] 3-path, interior [7] claw, leaf [8] claw, root [9] paw, handle [10] paw, base [11] paw, center [12] 4-cycle [13] diamond, off-cord [14] diamond, on-cord [15] 4-clique
[1] degree 1 0.797489 0.99255 0.721824 0.850319 0.957717 0.59164 0.987567 0.591097 0.662774 0.866555 0.865433 0.590136 0.683471 0.537678
[2] 2-path 0.797489 1 0.75209 0.858184 0.980674 0.917015 0.922261 0.738192 0.906362 0.886047 0.879049 0.938975 0.859091 0.817937 0.66163
[3] bifork 0.99255 0.75209 1 0.64882 0.810571 0.940231 0.537775 0.9985 0.539182 0.588904 0.813029 0.842782 0.519062 0.618533 0.502792
[4] 3-cycle 0.721824 0.858184 0.64882 1 0.850729 0.76346 0.782988 0.627428 0.773249 0.962727 0.951642 0.766913 0.893846 0.952637 0.712615
[5] 3-path, end 0.850319 0.980674 0.810571 0.850729 1 0.944489 0.855183 0.797396 0.855218 0.851695 0.898804 0.969354 0.817382 0.804315 0.638269
[6] 3-path, interior 0.957717 0.917015 0.940231 0.76346 0.944489 1 0.747828 0.933315 0.738758 0.739955 0.884068 0.952221 0.686035 0.719056 0.564356
[7] claw, leaf 0.59164 0.922261 0.537775 0.782988 0.855183 0.747828 1 0.523431 0.983409 0.87542 0.740177 0.788411 0.888326 0.76362 0.672914
[8] claw, root 0.987567 0.738192 0.9985 0.627428 0.797396 0.933315 0.523431 1 0.525663 0.568635 0.795132 0.833967 0.500084 0.601525 0.488454
[9] paw, handle 0.591097 0.906362 0.539182 0.773249 0.855218 0.738758 0.983409 0.525663 1 0.865189 0.732816 0.788529 0.882963 0.757707 0.667036
[10] paw, base 0.662774 0.886047 0.588904 0.962727 0.851695 0.739955 0.87542 0.568635 0.865189 1 0.894069 0.753967 0.957287 0.935547 0.729429
[11] paw, center 0.866555 0.879049 0.813029 0.951642 0.898804 0.884068 0.740177 0.795132 0.732816 0.894069 1 0.852875 0.813447 0.878809 0.645709
[12] 4-cycle 0.865433 0.938975 0.842782 0.766913 0.969354 0.952221 0.788411 0.833967 0.788529 0.753967 0.852875 1 0.709944 0.721873 0.570544
[13] diamond, off-cord 0.590136 0.859091 0.519062 0.893846 0.817382 0.686035 0.888326 0.500084 0.882963 0.957287 0.813447 0.709944 1 0.877049 0.731786
[14] diamond, on-cord 0.683471 0.817937 0.618533 0.952637 0.804315 0.719056 0.76362 0.601525 0.757707 0.935547 0.878809 0.721873 0.877049 1 0.688793
[15] 4-clique 0.537678 0.66163 0.502792 0.712615 0.638269 0.564356 0.672914 0.488454 0.667036 0.729429 0.645709 0.570544 0.731786 0.688793 1

If we want to measure how similar or different the GCMs are between two graphs, we can compute a matrix distance (e.g., Frobenius norm of the difference, or sum of absolute differences, etc.):

dist_gcm = np.zeros((3,3))
for i,Fa in enumerate( [gcm1, gcm2, gcm3] ):
    for j,Fb in enumerate( [gcm1, gcm2, gcm3] ):
        dist_gcm[i,j] = fg.gcm_distance(Fa, Fb)
G1 G2 G3
G1 0 4.33726 131.925
G2 4.33726 0 130.265
G3 131.925 130.265 0

Summary

  • RGF Distance: Captures how different the relative orbit frequencies are between two graphs.
  • GDD Agreement: Measures how similarly the two graphs’ vertices distribute their orbit degrees.
  • GCM: Captures how orbits co-vary (correlate) within a single graph, and can also be compared across graphs.

These measures are widely used in graphlet-based network comparison (e.g., Netdis, GraphCrunch2, etc.). Modify these examples to suit your exact needs (e.g., weighting orbits differently, using different correlation measures, etc.).