Graph Database

CP 47/80 points

Models data as nodes (entities) and edges (relationships) with properties on both, enabling efficient traversal of deeply connected data. Optimized for relationship-heavy queries like social networks, fraud detection, and knowledge graphs.

↑ Scale 4

⚡ Perf 6

⚓ Rely 7

⚙ Ops 5

⯑ Query 8

⧉ Schema 7

★ Eco 6

↗ Learn 4

Σ Total 47/80

Character

The social butterfly who sees the world as a web of connections. While others store data in rows or documents, it maps relationships as first-class citizens and can trace six degrees of separation in milliseconds. It thinks in connections, not collections.

When to Use

Social networks and relationship mapping
Fraud detection and anti-money laundering
Knowledge graphs and ontology management
Network topology and dependency analysis
Recommendation engines based on connections

Avoid When

Data is primarily tabular with few relationships
The workload is dominated by bulk aggregations or analytics
Horizontal write scaling is a primary requirement
The team lacks graph modeling expertise

Dimension Analysis

↑ Scalability 4/10

Graph traversals are inherently difficult to partition. Splitting connected graphs across nodes creates expensive cross-partition hops. Neo4j scales reads via replicas, but writes are limited by leader node capacity.

⚡ Performance 6/10

Index-free adjacency enables constant-time traversals regardless of graph size, so following a relationship is O(1). However, graph-wide analytics and pathfinding on large datasets can be computationally expensive.

⚓ Reliability 7/10

Neo4j offers ACID transactions, clustering, and causal consistency. Amazon Neptune provides multi-AZ replication. Graph databases are generally reliable but lack the decades of hardening that relational databases have.

⚙ Operational Simplicity 5/10

Graph modeling is powerful but requires careful design. Operations like data imports, backups, and cluster management differ significantly from relational workflows, and the tooling is less mature.

⯑ Query Flexibility 8/10

Cypher (Neo4j) and Gremlin (TinkerPop) enable expressive pattern matching, path finding, and relationship traversals that would require complex recursive CTEs in SQL. Powerful for connected data, but limited for tabular analytics.

⧉ Schema Flexibility 7/10

Property graphs allow dynamic properties on nodes and edges. No rigid schema is required, and new node types and relationship types can be added without migrations. However, query performance depends on consistent graph structure.

★ Ecosystem Maturity 6/10

Neo4j dominates with 15+ years of development and a strong community. The broader graph database ecosystem is smaller than relational or document databases, with fewer tools and integrations available.

↗ Learning Curve 4/10

Graph thinking requires a paradigm shift from tabular data. Learning Cypher or Gremlin, understanding traversal strategies, and designing effective graph models takes significant investment for developers from relational backgrounds.

CAP Theorem

CP Consistency + Partition Tolerance

Neo4j uses leader-based replication with causal consistency, prioritizing consistency over availability during partitions. Amazon Neptune offers strong consistency within a region. Some graph databases like ArangoDB offer tunable consistency.

Top Databases

Neo4j GPL v3 (Community) / Proprietary (Enterprise)

The most popular graph database with native graph storage, the Cypher query language, and a rich ecosystem of visualization tools, an algorithms library, and integrations.

Amazon Neptune Proprietary (AWS managed service)

Fully managed graph database supporting both property graph (Gremlin) and RDF (SPARQL) models. Offers multi-AZ high availability with up to 15 read replicas.

ArangoDB Apache 2.0 (Community) / Proprietary (Enterprise)

Multi-model database with native graph, document, and key-value support. Uses AQL (ArangoDB Query Language) for unified querying across all data models.

TigerGraph Proprietary (Community / Enterprise)

High-performance graph analytics platform designed for deep-link analysis across massive datasets. Supports real-time graph analytics with parallel processing and the GSQL query language.