Skip to content

Commit 99de76d

Browse files
authored
Merge pull request MichaelCade#419 from triggan/main
2 parents 7035662 + 0c62909 commit 99de76d

2 files changed

Lines changed: 44 additions & 25 deletions

File tree

2023/day63.md

Lines changed: 44 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,13 @@ The aim of this series of blog posts is to provide an introduction to databases
66

77
Here’s what we’ll be covering: -
88

9-
- An introduction to databases
10-
- Querying data in databases
11-
- Backing up databases
12-
- High availability and disaster recovery
13-
- Performance tuning
14-
- Database security
15-
- Monitoring and troubleshooting database issues
9+
- [An introduction to databases](./day63.md)
10+
- [Querying data in databases](./day64.md)
11+
- [Backing up databases](./day65.md)
12+
- [High availability and disaster recovery](./day66.md)
13+
- [Performance tuning](./day67.md)
14+
- [Database security](./day68.md)
15+
- [Monitoring and troubleshooting database issues](./day69.md)
1616

1717
We’ll also be providing examples to accompany the concepts discussed. In order to do so you will need Docker Desktop installed. Docker can be downloaded here (https://www.docker.com/products/docker-desktop/) and is completely free.
1818

@@ -26,10 +26,10 @@ https://www.pgadmin.org/
2626
# About Us
2727

2828
<b>Andrew Pruski</b><br>
29-
Andrew is a Field Solutions Architect working for Pure Storage. He is a Microsoft Data Platform MVP, Certified Kubernetes Administrator, and Raspberry Pi tinkerer. You can find him on twitter @dbafromthecold, LinkedIn, and blogging at dbafromthecold.com
29+
Andrew is a Field Solutions Architect working for Pure Storage. He is a Microsoft Data Platform MVP, Certified Kubernetes Administrator, and Raspberry Pi tinkerer. You can find him on twitter [@dbafromthecold](https://twitter.com/dbafromthecold), [LinkedIn](https://www.linkedin.com/in/andrewpruski/), and blogging at [dbafromthecold.com](https://dbafromthecold.com/)
3030

3131
<b>Taylor Riggan</b><br>
32-
Taylor is a Sr. Graph Architect on the Amazon Neptune development team at Amazon Web Services. He works with customers of all sizes to help them learn and use purpose-built NoSQL databases via the creation of reference architectures, sample solutions, and delivering hands-on workshops. You can find him on twitter @triggan and LinkedIn.
32+
Taylor is a Sr. Graph Architect on the Amazon Neptune development team at Amazon Web Services. He works with customers of all sizes to help them learn and use purpose-built NoSQL databases via the creation of reference architectures, sample solutions, and delivering hands-on workshops. You can find him on twitter [@triggan](https://twitter.com/triggan) and [LinkedIn](https://www.linkedin.com/in/triggan/).
3333

3434
<br>
3535

@@ -52,64 +52,83 @@ This is where database technologies come into play. Databases give us the abilit
5252

5353
# Relational databases
5454

55+
## History & Background
56+
5557
When it comes to databases, there are two main types...relational and non-relational (or NoSQL) databases.
5658

5759
SQL Server, Oracle, MySQL, and PostgreSQL are all types of relational databases.
5860

59-
Relational databases were first described by Edgar Codd in 1970 whilst he was working at IBM in a research paper , “A Relation Model of Data for Large Shared Data Banks.
61+
Relational databases were first described by Edgar Codd in 1970 whilst he was working at IBM in a research paper , [“A Relation Model of Data for Large Shared Data Banks"](https://dl.acm.org/doi/pdf/10.1145/362384.362685).
6062

6163
This paper led the way for the rise of the various different relational databases that we have today.
6264

63-
In a relational database, data is organised into tables (containing rows and columns) and these tables have “relationships” with each other.
65+
## Data Model
66+
67+
In a relational database, data is organised into tables (containing rows and columns). Relationships are generated by creating cross-references between rows in different tables.
6468

6569
For example, a Person table may have an addressID column which points to a row within an Address table, this allows for an end user or application to easily retrieve a record from the Person table and the related record from the Address table.
6670

67-
The addressID column is a unique “key” in the Address table but is present in the Person table as a “foreign key”.
71+
![](images/day63-2.png)
72+
73+
The AddressID column is a unique “key” in the Address table but is present in the Person table as a “foreign key”. We display this relationship in the diagram above. This is a simple example of an [Entity-Relationship](https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model) diagram that shows the structure of our database.
6874

69-
The design of the tables and the relations between them in a relational database is said to be the database schema. The process of building this schema is called database normalisation.
75+
The design of the tables and the relations between them in a relational database is said to be the database schema. The process of building this schema is called [database normalisation](https://en.wikipedia.org/wiki/Database_normalization).
7076

71-
Data is selected, updated, or deleted from a relational database via a programming language called SQL (Structured Query Language).
77+
Data is selected, updated, or deleted from a relational database via a programming language called SQL (Structured Query Language). We'll cover some of the basic concepts of SQL in [Day 64](./day64.md).
78+
79+
## Indexes
7280

7381
In order to support retrieving data from tables in a relational database, there is the concept of “indexes”. In order to locate one row or a subset of rows from a table, indexes provide a way for queries to quickly identify the rows they are looking for, without having to scan all the rows in the table.
7482

7583
The analogy often used when describing indexes is an index of a book. The user (or query) uses the index to go directly to the page (or row) they are looking for, without having to “scan” all the way through the book from the start.
7684

77-
Queries accessing databases can also be referred to as transactions…a logical unit of work that accesses and/or modifies the data. In order to maintain consistency in the database, transactions must have certain properties. These properties are referred to as ACID properties: -
85+
Queries accessing databases can also be referred to as transactions…a logical unit of work that accesses and/or modifies the data. In order to maintain consistency in the database, transactions must have certain properties. These properties are referred to as [ACID](https://en.wikipedia.org/wiki/ACID) properties: -
7886

7987
A - Atomic - all of the transaction completes or none of it does<br>
8088
C - Consistency - the data modified must not violate the integrity of the database<br>
8189
I - Isolation - multiple transactions take place independently of one another<br>
8290
D - Durability - Once a transaction has completed, it will remain in the system, even in the event of a system failure.
8391

84-
We will go through querying relational databases in the next blog post.
92+
We will go through querying relational databases in the [next blog post](./day64.md).
8593

8694
<br>
8795

8896
# Non-Relational databases
8997

9098
The downside of relational databases is that the data ingested has to "fit" to the structure of the database schema. But what if we're dealing with large amounts of data that doesn't match that structure?
9199

92-
This is where non-relational databases come into play. These types of databases are referred to as NoSQL (non-SQL or Not Only SQL) databases and are either schema-free or have a schema that allows for changes in the structure.
100+
This is where non-relational databases come into play. These types of databases are referred to as NoSQL (non-SQL or Not Only SQL or NewSQL) databases and are either schema-free or have a schema that allows for changes in the structure.
93101

94102
Apache Cassandra, MongoDB, and Redis are all types of NoSQL databases.
95103

96104
Non-relational databases have existed since the 1960s but the term “NoSQL” was used in 1998 by Carlo Strozzi when naming his Strozzi NoSQL database, however that was still a relational database. It wasn’t until 2009 when Johan Oskarsson reintroduced the term when he organised an event to discuss “open-source distributed, non-relational databases”.
97105
There are various different types of NoSQL databases, all of which store and retrieve data differently.
98106

99-
For example: -
107+
## Types of Non-Relational Databases
100108

101-
Apache Cassandra is a wide-column store database. It uses tables, rows, and columns like a relational database but the names and formats of the columns can vary from row to row in the same table. It uses Cassandra Query Language (CSQL) to access the data stored.
109+
### Wide-Coluumn or Key-Value Stores
102110

103-
MongoDB is a document store database. Data is stored as objects (documents) within the database that do not adhere to a defined schema. MongoDB supports a variety of methods to access data, such as range queries and regular expression searches.
111+
[Apache Cassandra](https://cassandra.apache.org/_/index.html) is a wide-column store database. It uses tables, rows, and columns like a relational database but the names and formats of the columns can vary from row to row in the same table. It uses Cassandra Query Language (CSQL) to access the data stored. Wide-column databases, or key-value stores, are optimized for storing very large datasets and servicing millions of queries to perform simple look-ups by a "key", or a single value. These are used in place of a Relational Database when there is a requirement to scale-out the underlying infrastructure to multiple machines in order to process so many requests.
104112

105-
Redis is a distributed in-memory key-value database. Redis supports many different data structures - sets, hashes, lists, etc. - https://redis.com/redis-enterprise/data-structures/
106-
The records can be identified using a unique key. Redis supports various different programming languages in order to access the data stored.
113+
### Document Stores
107114

108-
NoSQL databases generally do not comply with ACID properties but there are exceptions.
115+
[MongoDB](https://www.mongodb.com/) is a document store database. Data is stored as objects (documents) within the database that do not adhere to a defined schema. MongoDB supports a variety of methods to access data, such as range queries and regular expression searches. Document stores extend the thinking of key-value stores by servicing lots of requests (millions, at-scale) but also allowing a user to write queries on nested attributes. Documents are typically formated as JSON (not Microsoft Office documents ;) ), and the nested structure of JSON provides an easy means to define query filters on JSON document attributes.
109116

110-
Each has pros and cons when it comes to storing data, which one to use would be decided on the type of data that is being ingested.
117+
### In-Memory Databases
111118

112-
<br>
119+
[Redis](https://redis.io/) is a distributed in-memory key-value database. Redis supports many [different data structures](https://redis.com/redis-enterprise/data-structures/) - sets, hashes, lists, etc. The records can be identified using a unique key. Redis supports various different programming languages in order to access the data stored. In-memory stores are used when fast (sub-millisecond) access times are required. They trade-off a higher cost and lack of immediate persistency for fast access. Developers will often use in-memory stores as a cache in front of other databases, or to optimize aggregations. Redis supports a data structure called a sorted-set, which pre-sorts data and will perform better than an `ORDER BY` query in other databases. You might want to use this sort of pattern if you're building a product website and showing your users what are the best-selling products on your site (sorted by number of purchases).
120+
121+
### Graph Databases
122+
123+
[Neo4j](https://neo4j.com) is an open-source graph database. Graph databases are designed to store and query highly connected datasets with many relationships. Data is stored as vertices (or nodes) and edges (or relationships) and a number of attributes (called properties) that can describe a vertex or an edge. Graph databases are optimized for querying across the many connections in the dataset, patterns that may require complex `JOIN` patterns in a relational database. The query languages supported by Graph databases allow a user to easily express these types of "traversal" patterns. Allowing for concepts like recursion that are not easily expressed in languages such as SQL.
124+
125+
<br />
126+
127+
NoSQL databases typically trade-off the concept of ACID compliant transactions for their ability to scale or service specific types of data access patterns. If you want to understand these concepts better, we suggest you look into [CAP Theorem](https://en.wikipedia.org/wiki/CAP_theorem).
128+
129+
CAP Theorem is a concept from Computer Science that discusses the tradeoffs between data Consistency, Availability, and Partitioning. Each has pros and cons when it comes to storing data, which ones to use would be decided on the type of data that is being ingested.
130+
131+
<br />
113132

114133
# When to use relational vs non-relational databases
115134

2023/images/day63-2.png

18.1 KB
Loading

0 commit comments

Comments
 (0)