Relational Model and Why It Doesn’t Matter

Relational Model importance or Not?

In the last article, we pointed out why using the relational model is very important. But there is no perfect model for the world. Abstractions are built to simplify things, but inevitably those abstractions don’t fit, or they cover up an important detail that must be dealt with. Fanatically adhering to the relational model will cause problems.

Don’t be fooled by books and products that offer a false choice; we don’t have to choose between a pure relational solution or completely abandon the relational model. Current relational databases, and the SQL language, don’t have to be perfect to be useful. We must avoid the temptation to search for pure and true solutions and embrace practical compromises.

The following sections discuss parts of the relational model that don’t always work in practice. The good news is that there’s always a workaround.

The NULL Problem Isn’t a Problem

One of the biggest theoretical complaints about relational databases is the way they use NULL. Three-valued logic with nulls is weird and takes time to get used to. Table 1 shows one of many possible truth tables it takes to explain how NULL works.

Table 1 NULL Three-Valued Logic

A

B

=

!=

1

1

True

False

1

0

False

True

1

NULL

Unknown

Unknown

NULL

NULL

Unknown

Unknown

NULLs are weird, but so is our data. There are many times when we don’t know something. It might be tempting to replace NULL with a list of reasons why we don’t know something. But there are many times when we don’t even know why we don’t know something. And those lists of reasons can grow quickly and become ridiculous.

NULLs could be eliminated by creating a separate table for every nullable column. But creating such a large collection of tables quickly becomes a huge mess.

For example, some people don’t have a first name, or a middle name, or a last name. Instead of using nullable columns to store an employee’s name, we could create separate tables EMPLOYEE_FIRST_NAME, EMPLOYEE_MIDDLE_NAME, and EMPLOYEE_LAST_NAME. Each table only has two values, an EMPLOYEE_ID and a NAME. If an employee is missing a middle name, there will be no row in EMPLOYEE_MIDDLE_NAME for their EMPLOYEE_ID. Congratulations, we just created a schema without any NULLs. But now we need to join four tables just to find a person’s name. And if other columns become nullable in the future, we have to make significant changes to the schema, instead of just altering a single column. Person names can be complicated, and it may sometimes make sense to store them in multiple tables. But don’t do it out of a fear of NULL.

It feels unnatural the way NULL = NULL returns unknown, which in most contexts translates to FALSE. And it’s annoying when we write a NOT IN that compares against a NULL and doesn’t return any results. But those problems don’t mean our databases contain a ticking time bomb, like some horrible Y2K bug waiting to ruin everything. Unknown data creates challenges, but we need to be comfortable with uncertainty.

Column Order Matters

The physical order of columns and rows does not matter in the relational model. Physical data independence is a big improvement over systems where the order was important. Data independence is not merely a historical problem, it still happens if we mistakenly add complex, formatted data in a single value.

E.F. Codd’s paper does not imply that the logical column order is unimportant. The examples in his paper have a meaningful order that helps the reader understand the relationships between the columns. Grouping similar columns, and listing them in the same order across different tables, makes his examples easier to read. A meaningful column order can also make our schemas easier to read.

Our systems might have a user guide, a fancy entity–relationship (ER) diagram, PDF file, and other forms of documentation. But the primary interface to our schema is a simple SELECT statement. Other SQL developers view and understand our tables based on the output from SELECT * FROM SOME_TABLE, not from an out-of-date PDF file.

We shouldn’t just throw new columns on the end of tables. We should take the time to shift the columns around, if it makes sense. Changing column order only takes a few minutes. In early versions of Oracle, columns can be adjusted using a combination of adding temporary columns, updating the table, and then dropping temporary columns. Since 12.1, columns can be moved even more easily by setting them to INVISIBLE and then back to VISIBLE. Those few minutes are only spent once; a bad table design can annoy us for years.

Denormalization

Ideally our relational databases are fully normalized, contain no non-atomic values, and contain no redundant values. It is reasonable to have a firm “no non-atomic values” rule and forbid anyone from ever adding comma-separated lists as values. However, it’s not always realistic to insist on preventing redundant values. Not every system should be in third normal form.

There are times when performance requires writing the data multiple times, to improve read time. This may mean adding a second version of a column somewhere and synchronizing them with triggers. Or possibly creating a materialized view, like taking a point-in-time snapshot of pre-joined tables. This is a complex trade-off between storage, speed, and consistency.

There are many systems where this trade-off must be made. It doesn’t break the relational model to do this. E.F. Codd’s paper discussed the problems with denormalization. But his paper also acknowledged that denormalization is going to happen anyway.

All Rows Are Distinct

The relational model is built on sets, and a set cannot have duplicate elements. This implies that all rows, and all query results, should be unique.

It could be helpful to have a system that enforced the uniqueness of all SQL query results. It almost never makes sense to have truly duplicate values – there should always be a unique way to identify a value. We’ve all been guilty of throwing an extra DISTINCT operator at a query, just in case.

But it is not practical to always enforce that rule. There are only so many ways to detect duplicate values, and each way requires either a large amount of storage, extra processing time, or both. Sorting and hashing large data sets is so painful that it’s better to live with the possibility of duplicates than to always check the results.

 

Вас заинтересует / Intresting for you:

Relational Model and Why It Ma...
Relational Model and Why It Ma... 375 views Александров Попков Sat, 18 Jul 2020, 19:28:38
Understanding SQL and Database...
Understanding SQL and Database... 1079 views Ирина Светлова Thu, 17 May 2018, 18:18:17
SQL - Structured Query Languag...
SQL - Structured Query Languag... 1429 views Aida Wed, 25 Apr 2018, 11:12:33
Types of NoSQL Databases
Types of NoSQL Databases 1416 views Валерий Павлюков Wed, 13 Jun 2018, 19:12:01