Why Venn diagram is a bad choice to illustrate JOIN clause
By Leonid Toshchev, March 25, 2015, 5:46 p.m.

Historically, in order to explain the concept of JOIN clause authors of textbooks and tutorials use a Venn diagram. I do not know who was the first to use it, but this addiction firmly stuck in the minds of everyone who has ever explained an operator of the connection in SQL.

I will show flaws of this approach by giving you an example.

We have two tables: Books (which contains a list of the books) and Authors (obviously it contains a list of you-know-who). This example is just as popular as a Venn diagram. But from the point of view of the teacher have some good sides: everyone knows what book and author is, it is plain and simple and doesn't require any specific knowledge.

Suppose that we want to use INNER JOIN to retrieve data from these tables. The corresponding diagram is as follows:

Left circle represents a set of Books, and the right set of Authors. In the center we can see the colored intersection of these sets. At this point, the attentive reader may notice: what is the intersection of a set of books and set of authors? Internal mathematician furiously shouting that their intersection is the empty set. But the diagram tells us the opposite. And then there's the result table, in which we see the data of both entities.

The result is more like a union of parts of the sets than their intersection. We can replace the above diagram with the next, but its main point has not changed:

The situation becomes even more interesting when we consider other types of joins. For example, CROSS JOIN. No way we can show that in the resulting table elements may be repeated several times. If we mentioned that we make the same diagrams for the different queries with different results the visibility of this process put in serious doubt. Because some diagrams become ambiguous.

The key concept in relational databases is the relationships between the tables. JOIN clause work exactly with this relationships. And this is exactly the basis of explanation of JOIN clause.

People used to think in patterns. When he faces a new activity, he learns some patterns of behavior and then combine them in different ways. Sometimes they even creates new patterns during this process. Therefore, learning by doing gives such a good result. Do not be afraid that the specific examples do not cover all possible options. You just need to give more examples. If the user knows how JOIN works on two tables for 10 rows, then he will be able to extrapolate the knowledge on the table with millions of rows. Remember, a picture paints a thousand words.

A perfect illustration of the JOIN clause work is mapping set into the set. If we remove arrows from the lines (we don't want to make same mistakes that we did with a Venn diagram. There is no real mapping at JOIN clause) we will see a picture that more similar to the real situation.

As part of our course of SQL we started to develop an online tool aimed at training to work with JOIN clause and relationships between tables. The basis of these lessons is the approach described above. Alpha-version is presented below:

Soon it will be expanded to the level of full lessons and included in the training course of the SQL.

If you have any comments or suggestions - feel free to email me: mail@datamonkey.pro

Achievement unlocked
A lot of words about why we are so awesome. Or something else. Dont know. Really.