A curated collection of best learning resources on IT topics

All topics -> Databases

A database is a collection of data organized for the ease of retrieval, update, and management. Software that manages the database and provides access to the data is called a database management system (DBMS). The term database is often used to refer to a DBMS as well. Databases vary a lot by data model and query language – the ways the data is stored and accessed. The most common types are relational databases (Postrgres, MySQL, SQLite), document-oriented databases (MongoDB, CouchDB, Couchbase), key-value stores (Redis, Aerospike), wide-column databases (DynamoDB, Cassandra), and graph databases (Neo4j, Amazon Neptune).

Data is at the core of most applications be it an online shop, a social network, a business management system, or some machine learning solution. Key challenges in the application design lie in data management and processing. Thus, knowledge of databases is an essential skill for software engineers.

First databases appeared in 1960s. Before that time, computers were used primarily as expensive calculators in science and research. Companies were trying to use computers for storing and processing business data, but it was very slow and inconvenient. They had punch cards and magnetic tapes for secondary storage so the data had to be accessed sequentially. In 1960s hard disks became available, which allowed for the development of convenient ways to organize and access data.

Charles Bachman designed the first DBMS called Integrated Data Store (IDS) in 1964. The innovation was that each database record could refer to a list of other records and programmers could navigate between related entities. This data model became known as a network model. Another development of that time was a hierarchical model that limited the network model to a tree-like structure. Its implementation was the IBM Information Management System (IMS).

Ted Codd proposed his relational model in 1970 where all the data is organized into tuples and relations (or rows and tables). The relational model was simpler than its predecessors. It allowed for a high-level, easy-to-use query language. Structured Query Language (SQL) was developed by Donald Chamberlin and Raymond Boyce for IBM's early relational DBMS called System R soon after Codd's work. By 1984, System R turned into commercially successful Db2. More SQL databases followed including Oracle and PostgreSQL. The relational model took over the world and remains dominant to this day.

People started to question the dominance of SQL databases only in 2000s with the advent of large-scale, real-time, and big data applications. The NoSQL movement gave rise to databases that are designed for distributed environment and offer alternative data models and better scaling properties. Document-oriented databases such as MongoDB became a default choice for many software engineers.

SQL databases are fighting back with NewSQL. NewSQL is a class of modern relational databases that aim to provide the same scalable performance of NoSQL while still using SQL and maintaining the ACID guarantees of a traditional database system. Notable examples are CockroachDB and VoltDB. This is an exciting time in database development.

Links: Database (Wikipedia), Data model (Wikipedia).

Subtopics: Postgres.

Here's 4 amazing resources to learn Databases:

  1. Designing Data-Intensive Applications (www.oreilly.com)
    paid • book • by Martin Kleppmann • 2017

    This book is universally considered a must read for software engineers (and a very enjoyable one). It explores diverse space of technologies for processing and storing data: relational and NoSQL databases, search indexes, message queues, batch and stream processing frameworks, and more. You'll be better at deciding which technologies to use for a particular application and you'll learn how to meet key challenges in the design of distributed systems that arise with replication and partitioning.

  2. Readings in Database Systems, 5th Edition (www.redbook.io)
    free • book • by Peter Bailis, Joseph M. Hellerstein, Michael Stonebraker • 2015

    Also known as the "Red Book", this is a survey of classic and cutting-edge research in the field of data management. It covers historical data models, traditional RDBMS, large-scale dataflow engines, and emerging NoSQL solutions. Each chapter contains a list of "must read" primary sources and a short commentary that introduces the readers to the topic and the selected papers.

  3. The Architecture of Open Source Applications: The NoSQL Ecosystem (aosabook.org)
    free • article • by Adam Marcus • 2011

    This is a chapter of the AOSA book that explores the vast space of NoSQL databases. You'll learn about different NoSQL approaches for storing and querying data, their pros and cons compared to SQL, and how they scale.

  4. Database Systems: The Complete Book, 2nd edition (infolab.stanford.edu)
    paid • book • by Hector Garcia-Molina, Jeff Ullman, Jennifer Widom • 2008

    This is a comprehensive textbook on the theory and implementation of relational databases. Intended for academic audience, it may not be the most practical text. Still, it's a good resource to learn about the relational model and relational algebra in detail.