Graphlet-based network properties
Setup
You need to install the following packages for this demo.
We’ll create three graphs as running examples in this tutorial.
import numpy as np
import pandas as pd
import networkx as nx
G1 = nx.barabasi_albert_graph(500, 7, seed = 0)
G2 = nx.barabasi_albert_graph(1000, 8, seed = 1)
G3 = nx.watts_strogatz_graph(750, 7, 0.02, seed = 0)
We then compute the graphlet frequencies for each network.
Relative graphlet frequency (RGF) distance
The Relative Graphlet Frequency of an orbit is the frequency of that orbit (summing across all vertices) divided by the sum of all orbit frequencies in the graph.
RGF Distance between two graphs \(G\) and \(H\) for orbits \(\{1,\ldots,16\}\) can be defined as:
where \(F_G(k)\) is the sum of orbit \(k\) counts across all vertices in graph \(G\).
rgf_dist = np.zeros((3,3))
for i,Fa in enumerate( [F1, F2, F3] ):
for j,Fb in enumerate( [F1, F2, F3] ):
rgf_dist[i,j] = fg.compute_rgf_distance(Fa, Fb)
G1 | G2 | G3 | |
---|---|---|---|
G1 | 0 | 0.108194 | 0.8892 |
G2 | 0.108194 | 0 | 0.939122 |
G3 | 0.8892 | 0.939122 | 0 |
Graphlet degree distribution (GDD) agreement
The Graphlet Degree of a vertex for a specific orbit is the number
of times that vertex touches (or participates in) that orbit.
Given the orbit counts in the DataFrame, each column is effectively the
graphlet degree for each vertex across an orbit.
To measure GDD Agreement between two graphs, one approach (inspired by GraphCrunch2) is:
- For each orbit \(i\) (from 1 to 16):
- Build the distribution (histogram) of the graphlet degrees \(d^G_i\) across all vertices in graph \(G\).
- Same for graph \(H\).
- Compare these two distributions using an overlap measure. For discrete distributions \(P_i^G\) and \(P_i^H\): $$ \text{Overlap}(P_i^G, P_i^H) = \sum_k \min\bigl(P_i^G(k), P_i^H(k)\bigr) $$ where \(P_i^G(k)\) is the fraction of vertices in \(G\) that have graphlet degree \(k\) for orbit \(i\).
- Average this overlap across all orbits: $$ \text{GDD-Agreement}(G,H) = \frac{1}{16} \sum_{i=1}^{16} \text{Overlap}(P_i^G, P_i^H). $$
This yields a value in \([0, 1]\), where 1 means perfect agreement of distributions.
gdd_agree = np.zeros((3,3))
for i,Fa in enumerate( [F1, F2, F3] ):
for j,Fb in enumerate( [F1, F2, F3] ):
gdd_agree[i,j] = fg.compute_gdd_agreement(Fa, Fb)
G1 | G2 | G3 | |
---|---|---|---|
G1 | 1 | 0.523313 | 0.123542 |
G2 | 0.523313 | 1 | 0.115167 |
G3 | 0.123542 | 0.115167 | 1 |
Graphlet Correlation Matrix (GCM)
The Graphlet Correlation Matrix (GCM) captures correlations (usually Spearman or Pearson) between different orbits across the vertices. Essentially:
where \(\rho_{i,j}\) is the correlation (e.g., Spearman) between the i-th orbit degrees and the j-th orbit degrees across all vertices in graph \(G\).
gcm1 = fg.compute_graphlet_correlation_matrix(F1, method='spearman')
gcm2 = fg.compute_graphlet_correlation_matrix(F2, method='spearman')
gcm3 = fg.compute_graphlet_correlation_matrix(F3, method='spearman')
GCM example
We now have the correlation matrix (GCM) for each graph. The GCM for
graph G1
is shown below
[1] degree | [2] 2-path | [3] bifork | [4] 3-cycle | [5] 3-path, end | [6] 3-path, interior | [7] claw, leaf | [8] claw, root | [9] paw, handle | [10] paw, base | [11] paw, center | [12] 4-cycle | [13] diamond, off-cord | [14] diamond, on-cord | [15] 4-clique | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
[1] degree | 1 | 0.797489 | 0.99255 | 0.721824 | 0.850319 | 0.957717 | 0.59164 | 0.987567 | 0.591097 | 0.662774 | 0.866555 | 0.865433 | 0.590136 | 0.683471 | 0.537678 |
[2] 2-path | 0.797489 | 1 | 0.75209 | 0.858184 | 0.980674 | 0.917015 | 0.922261 | 0.738192 | 0.906362 | 0.886047 | 0.879049 | 0.938975 | 0.859091 | 0.817937 | 0.66163 |
[3] bifork | 0.99255 | 0.75209 | 1 | 0.64882 | 0.810571 | 0.940231 | 0.537775 | 0.9985 | 0.539182 | 0.588904 | 0.813029 | 0.842782 | 0.519062 | 0.618533 | 0.502792 |
[4] 3-cycle | 0.721824 | 0.858184 | 0.64882 | 1 | 0.850729 | 0.76346 | 0.782988 | 0.627428 | 0.773249 | 0.962727 | 0.951642 | 0.766913 | 0.893846 | 0.952637 | 0.712615 |
[5] 3-path, end | 0.850319 | 0.980674 | 0.810571 | 0.850729 | 1 | 0.944489 | 0.855183 | 0.797396 | 0.855218 | 0.851695 | 0.898804 | 0.969354 | 0.817382 | 0.804315 | 0.638269 |
[6] 3-path, interior | 0.957717 | 0.917015 | 0.940231 | 0.76346 | 0.944489 | 1 | 0.747828 | 0.933315 | 0.738758 | 0.739955 | 0.884068 | 0.952221 | 0.686035 | 0.719056 | 0.564356 |
[7] claw, leaf | 0.59164 | 0.922261 | 0.537775 | 0.782988 | 0.855183 | 0.747828 | 1 | 0.523431 | 0.983409 | 0.87542 | 0.740177 | 0.788411 | 0.888326 | 0.76362 | 0.672914 |
[8] claw, root | 0.987567 | 0.738192 | 0.9985 | 0.627428 | 0.797396 | 0.933315 | 0.523431 | 1 | 0.525663 | 0.568635 | 0.795132 | 0.833967 | 0.500084 | 0.601525 | 0.488454 |
[9] paw, handle | 0.591097 | 0.906362 | 0.539182 | 0.773249 | 0.855218 | 0.738758 | 0.983409 | 0.525663 | 1 | 0.865189 | 0.732816 | 0.788529 | 0.882963 | 0.757707 | 0.667036 |
[10] paw, base | 0.662774 | 0.886047 | 0.588904 | 0.962727 | 0.851695 | 0.739955 | 0.87542 | 0.568635 | 0.865189 | 1 | 0.894069 | 0.753967 | 0.957287 | 0.935547 | 0.729429 |
[11] paw, center | 0.866555 | 0.879049 | 0.813029 | 0.951642 | 0.898804 | 0.884068 | 0.740177 | 0.795132 | 0.732816 | 0.894069 | 1 | 0.852875 | 0.813447 | 0.878809 | 0.645709 |
[12] 4-cycle | 0.865433 | 0.938975 | 0.842782 | 0.766913 | 0.969354 | 0.952221 | 0.788411 | 0.833967 | 0.788529 | 0.753967 | 0.852875 | 1 | 0.709944 | 0.721873 | 0.570544 |
[13] diamond, off-cord | 0.590136 | 0.859091 | 0.519062 | 0.893846 | 0.817382 | 0.686035 | 0.888326 | 0.500084 | 0.882963 | 0.957287 | 0.813447 | 0.709944 | 1 | 0.877049 | 0.731786 |
[14] diamond, on-cord | 0.683471 | 0.817937 | 0.618533 | 0.952637 | 0.804315 | 0.719056 | 0.76362 | 0.601525 | 0.757707 | 0.935547 | 0.878809 | 0.721873 | 0.877049 | 1 | 0.688793 |
[15] 4-clique | 0.537678 | 0.66163 | 0.502792 | 0.712615 | 0.638269 | 0.564356 | 0.672914 | 0.488454 | 0.667036 | 0.729429 | 0.645709 | 0.570544 | 0.731786 | 0.688793 | 1 |
If we want to measure how similar or different the GCMs are between two graphs, we can compute a matrix distance (e.g., Frobenius norm of the difference, or sum of absolute differences, etc.):
dist_gcm = np.zeros((3,3))
for i,Fa in enumerate( [gcm1, gcm2, gcm3] ):
for j,Fb in enumerate( [gcm1, gcm2, gcm3] ):
dist_gcm[i,j] = fg.gcm_distance(Fa, Fb)
G1 | G2 | G3 | |
---|---|---|---|
G1 | 0 | 4.33726 | 131.925 |
G2 | 4.33726 | 0 | 130.265 |
G3 | 131.925 | 130.265 | 0 |
Summary
- RGF Distance: Captures how different the relative orbit frequencies are between two graphs.
- GDD Agreement: Measures how similarly the two graphs’ vertices distribute their orbit degrees.
- GCM: Captures how orbits co-vary (correlate) within a single graph, and can also be compared across graphs.
These measures are widely used in graphlet-based network comparison (e.g., Netdis, GraphCrunch2, etc.). Modify these examples to suit your exact needs (e.g., weighting orbits differently, using different correlation measures, etc.).