Analytics or ML?
For many, graph analytics and graph machine learning are synonymous terms. This is not the case, however. In what follows we highlight the main differences and ingredients.
We offer consulting services across the whole graph spectrum, but it might be useful to differentiate the two domains since they lead to different project types and efforts.
1. Definition and Scope
- Graph Analytics:
- Graph analytics refers to a set of techniques and algorithms used to analyze graph structures. It focuses on understanding the (topological) properties and behavior of graphs, such as the relationships between nodes, the overall structure, and specific patterns within the graph. Common graph analytics tasks include calculating centrality measures, detecting communities, finding shortest paths, and identifying cliques or connected components.
- Graph analytics often involves predefined algorithms that are applied to a graph to extract insights or solve specific problems. The techniques are usually more deterministic and involve direct computation on the graph’s structure.
- Graph Machine Learning (GML):
- Graph machine learning involves the application of machine learning models to graph-structured data. GML aims to learn patterns and make predictions based on the data represented in a graph. This might include predicting properties of nodes, edges, or entire subgraphs, generating embeddings for nodes or graphs, and identifying new connections within the graph.
- Graph ML is more dynamic and involves training machine learning models on (a lot of) graph data to generalize from patterns in the data, enabling predictions or classifications that go beyond the capabilities of traditional graph analytics.
[!Summary] In essence, analytics is descriptive and deterministic. You can apply it to a graph or a set of graphs of any size. Graph ML is predictive and probabilistic. You need big data to create ML models.
2. Techniques and Algorithms
- Graph Analytics:
- Centrality Measures: Algorithms like PageRank, degree centrality, betweenness centrality, and eigenvector centrality are used to identify the most important nodes within a graph. Centrality is all about describing which are the important nodes and edges in a graph. The definition of ‘important’ leads to different notions of centrality.
- Community Detection: Algorithms such as modularity optimization, Louvain method, and spectral clustering are used to detect clusters or communities within a graph.
- Pathfinding: Algorithms like Dijkstra’s and A* are used to find the shortest paths between nodes.
- Subgraph Matching: Techniques for identifying specific patterns or motifs within a graph, such as triangles, stars, or more complex structures.
- Graph Machine Learning:
- Graph Neural Networks (GNNs): A class of neural networks designed to operate on graph-structured data, allowing for tasks like node classification, link prediction, and graph classification.
- Node/Edge/Graph Embeddings: Techniques like DeepWalk, Node2Vec, and GraphSAGE that learn low-dimensional vector representations of nodes, edges, or entire graphs, preserving the structure and properties of the graph.
- Supervised and Unsupervised Learning: GML applies both supervised (with labeled data) and unsupervised learning (without labels) techniques to graphs, learning patterns and making predictions based on the graph’s structure.
Note that graph machine learning goes by various names:
- graph machine learning
- geometric deep learning
- graph neural networks (GNN)
- network embedding
- graph representation learning
- machine learning on graphs
- graph embeddings.
Graph analytics can be done in-memory with frameworks like NetworkX or iGraph. If you have a large amount you can use a graph database and vendors have custom implementations. Neo4j’s GDS has lots of graph analytics and Memgraph has strong support for NetworkX.
Graph machine learning requires special GPU/CUDA frameworks like PyTorch or DGL. Like all data science efforts, it’s hard work and comes with a lot of experimenting and it requires skills and experience to design a ML model. Graph analytics is much more straightforward, in general.
3. Goals and Applications
- Graph Analytics:
- The primary goal is to explore and understand the structure and properties of the graph. It helps in answering questions like “Who are the key influencers in a social network?” or “What is the shortest path between two points in a transportation network?”
- Applications include network optimization, fraud detection, social network analysis, and infrastructure management, where the main focus is on interpreting the existing structure of the graph.
- Graph Machine Learning:
- The goal of GML is to learn from graph data to make predictions, classifications, or generate embeddings that can be used in downstream tasks. This could involve predicting future interactions in a network, classifying nodes based on their features and connections, or even generating new graph structures.
- Applications include recommendation systems, drug discovery (predicting molecular interactions), predictive maintenance (predicting failures based on equipment graphs), and any scenario where the goal is to predict unknown information based on the graph’s structure and existing data.
4. Complexity and Flexibility
- Graph Analytics:
- Generally involves straightforward, deterministic algorithms. The complexity depends on the size of the graph and the specific algorithms used but is often bounded by the need to directly compute on the graph’s structure.
- Less flexible in adapting to new patterns or unseen data since the analysis is usually based on predefined metrics and algorithms.
- Graph Machine Learning:
- More complex, as it involves training models that learn from data. The complexity also comes from the need to process graph data in a way that machine learning models can understand (e.g., through embeddings or GNNs).
- Highly flexible and adaptable to new data, allowing models to generalize and make predictions even on unseen parts of the graph.
5. Output
- Graph Analytics:
- Outputs are usually descriptive metrics or patterns. For example, you might get a list of nodes ranked by their centrality, clusters of nodes that form communities, or a visualization of the shortest path between nodes.
- Graph Machine Learning:
- Outputs are predictive in nature. A model embodies learned patterns. For example, predictions about the category of a node, probabilities of new edges forming between nodes, or learned embeddings that can be used in further machine learning tasks.