Graph Database
Graph databases are specialized databases that focus on storing the relationships between data entities. Entities in a graph database are stored as nodes, while the relationships between entities are referred to as edges. Nodes and edges can contain attributes specific to them, like tables in a relational database. Edges can also have a direction to indicate the nature of a relationship between two nodes.
Applications that use graph databases run queries that need to traverse the network of nodes and edges, analyzing the relationships between entities. Figure 3.4 demonstrates an example of a graph database that stores information about an organization’s personnel chart. The entities represent different job titles and departments, while the edges represent the reporting structure for different employees.
FIGURE 3.4 Graph database
Graph databases are optimal for solutions that ask questions such as “Find all employees that report directly to the CEO.” Applications querying large graphs with lots of nodes and edges, such as social media networks, can perform complex analyses very quickly. While relational databases can be used to store the same data as a graph database, queries written for the graph database circumvent any join operations or subqueries that would need to be considered for the relational database.
Graph databases can be implemented in Azure using the Azure Cosmos DB Gremlin API.
Azure Cosmos DB
Azure Cosmos DB is a multi-model PaaS NoSQL database management system. Multi-model means that organizations can use Azure Cosmos DB to build key-value, document, columnar, and graph data stores. The different categories are made available as database APIs, including the Table API, Core (SQL) API, API for MongoDB, Cassandra API, and Gremlin API. Users will have the option of choosing one of these APIs when deploying an instance of Azure Cosmos DB.
The highest level of management for Azure Cosmos DB is a database account. Currently, you are allowed to have up to 50 Azure Cosmos DB accounts in an Azure subscription, but that can be increased by submitting a support ticket in the Azure Portal. Each database account can have one or more databases (referred to as a keyspace when using the Azure Cosmos DB Cassandra API), that serve as the unit of management for a set of containers.
Containers are the fundamental unit of scalability for throughput and storage. It is at this level that data is partitioned and replicated across multiple regions. Users can also register stored procedures, user-defined functions, triggers, and merge procedures within a container. Containers are identified by different names depending on which type of NoSQL database is deployed. Table 3.1 lists the naming convention used by each Azure Cosmos DB API.
TABLE 3.1 Azure Cosmos DB API-specific names for containers
API | Container Naming Convention |
Table API | Table |
Core (SQL) API | Container |
Cassandra API | Table |
API for MongoDB | Collection |
Gremlin API | Graph |
Data stored in containers is automatically grouped into logical partitions based on a partition key and is distributed across physical partitions. A partition key is a designated data field that is used to efficiently group throughput and related data. Other than choosing an appropriate partition key, partition administration is handled internally by Azure Cosmos DB.
Individual data records stored in a database container are referred to as items. When Azure Cosmos DB partitions data, it groups items with the same partition key value into the same logical partition. Items are automatically indexed as they are added to a container. Indexing behavior can also be customized by configuring the indexing policy on the container. Like containers, items are referred to by different names depending on which type of NoSQL database is deployed. Table 3.2 lists the naming convention used by each Azure Cosmos DB API.
TABLE 3.2 Azure Cosmos DB API-specific names for items
API | Item Naming Convention |
Table API | Entity |
Core (SQL) API | Item |
Cassandra API | Row |
API for MongoDB | Document |
Gremlin API | Node or edge |