Skip to main content

Why table join doesn't work

Everything in the traditional relational database world is based on tables. If I have a table of Classes and a table of Students, I can have an association table that tells me who's in which class.

classIdclassNameinstructor
1Psych 101Dr. Jung

studentIdfirstNamelastName
1MollyRingwald
2AnthonyHall
3JoanCusack

Clearly, these two tables tell me nothing about which classes Molly is taking, nor which students have signed up for Psych 101.

An "association" table provides that information:
classIdstudentIdAssociation-specific notes
11
12late add
13

In the relational world, a LEFT JOIN gives us the student list for a given class:

classNameinstructorfirstNamelastNameAssn notes
Psych 101Dr. JungMollyRingwald
Psych 101Dr. JungAnthonyHalllate add
Psych 101Dr. JungJoanCusack

This approach is "pure" from a Chris Date perspective, but when we're talking about JSON data in response to a RESTful HTTP request, it seems quite burdensome. Rather than invent a tabular compression scheme that allows table cells to indicate "same as previous," we want to take advantage of the fact that JSON (and PHP) allows table cells to include arrays.

Instead of the final table as:
[
 { "className": "Psych 101", "instructor": "Dr. Jung", "firstName": "Molly", "lastName": "Ringwald", "Assn notes": "" },
 { "className": "Psych 101", "instructor": "Dr. Jung", "firstName": "Anthony", "lastName": "Hall", "Assn notes": "late add" },
{ "className": "Psych 101", "instructor": "Dr. Jung", "firstName": "Joan", "lastName": "Cusack", "Assn notes": "" }
]

We do it this way:
{ "className": "Psych 101", "instructor": "Dr. Jung",
   "students": [
    { "firstName": "Molly", "lastName": "Ringwald" },
    { "firstName": "Anthony", "lastName": "Hall" },
    { "firstName": "Joan", "lastName": "Cusack" }
    ]
}

CRUD (Create, Read, Update, Delete) operations on such a many-to-many database are interesting. Only delete needs to mutate the association table. In general, Classes and Students have non-overlapping CRUD. We can Create or Update a class or a student dealing only with that table. When we Delete from either table, we have to Delete from the association table, as well.

Read could be limited to a single table, but we have just demonstrated how we plan to use arrays to report relevant records from the secondary table. We can look at classes to see a list of students taking each class, or we can look at students to see lists of the classes they are taking.

We want to extend between primary and secondary tables in the same way, perhaps, in order to establish and drop these associations. There is no point updating an association, and reading is covered using arrays as described. So, we have Create and Delete.

Create is tantamount to "Record that this Student is now taking that Class." Delete is the same: "Record that this Student is no longer taking that Class." There is never an instance in which cancelling a class expels students from school entirely. We're only talking about creating or deleting this relationship.

We could indicate this using classId and studentId:

CREATE (a relationship between) classId, studentId.

This is akin to CRUD on the association table itself. As such, it involves the UI programmer too much in the underlying mechanism. Also, as a practical matter, the UI is usually looking at one master table or the other--a class and its students, or a student and their classes. To add a student or a class, we would have to magically know its id.

The solution is that, when looking at a Class record, the interface includes a StudentPicker that converts a student name entered into a concrete studentId. We already know the classId.

It seems reasonable to expose an asymmetric facility: You can add students to classes, but you can't add classes to a student. There is no technical reason why it can't work both ways, but the simplification is easier to explain and seems like common sense.  Looking at a class roster, a picker helps you find one of all the enrolled students. One student is the same as another--all people with names--so you begin entering the name and you find it. Classes, on the other hand, might be named oddly and are clustered around categories and fields of study. Molly can't just start typing "Brain Science" to find "Psych 101."

But, we conclude that our Association C--D takes two ids and perhaps an association note. There is no reason to specify names. However, the response to Create should be more verbose so that the UI can display information about, in this case, the new student. If we POST a new student, the response is the student record. Likewise, if we POST a new student in a class, we only provide the studentId, but the response should be the student record. A rich UI might display the photo of the student added to the class.





Comments

Popular posts from this blog

JSON/MySQL Schemas

As noted previously, there is a lot of overlap between the RDBMS world and the JSON world. Identifiers JSON is defined to allow identifiers of any kind of Unicode string, encoded in UTF-8 (or UTF-16, etc.). They begin and end with double quotation marks (U+0022), so included quotation marks and '\' must be escaped as \" or \\. Control characters (ASCII 0-x1F) must be escaped as well. In practice, JSON identifiers conform to ECMAScript standards . There are some 68  reserved keywords (function, import, for, if, else, and so on) that should not be used as identifiers. Unexpected reserved words include abstract, await, debugger, delete, finally, instanceof, super, synchronized, transient, volatile, and yield. The spec makes a distinction between identifiers and IdentifierNames (specifically, array keys), but why risk it? ECMAScript allows '$' and '_' anywhere in an identifier. Length or camelCasing are not part of the spec. As for length, there seems t...

A JSON Db Product?

The last post "solved" the problem of many-to-many table joins by papering over the association table with a RESTful JSON interface. As long as we're using JSON, we might as well take advantage of multi-valued table cells. I'm naturally wondering where this leads. JSON identifiers and types and SQL identifiers and types overlap so much that their intersection is a useful subset. Camel-case fields in string, number, bool flavors. Many-to-many occurs often in the world: Students in Classes Actors in Films (musicians on recorded songs) Parts in Assemblies Customers and Products (joined by Orders) The generalized description is that a Table requires a unique identifier for each row. Tables list students, actors, films, customers, and so on.  An Association Table is has two or more foreign keys that match unique identifiers in other tables. The knowledge of how a FK maps to a specific Table is baked in--we wouldn't want a "table name" column....

GraphQL is the many-to-many solution

Exactly! Regular readers of this blog (me) will appreciate my stumbling attempts to pre-define a REST interface that supports many-to-many interfaces. GET a class, for example, and the return includes an array of the students in that class. In this context, we don't want a full Student record, just the Student's name and Id, for example. With a REST interface, the server writer has to guess how to abbreviate the Student record. GraphQL fixes that. The front end requests just the data it wants. If we want a list of the students in a class and the assigned roommates for that student...we can do that! A lot of my prototype REST service is hardwired--not single tables, so much, but the many-to-many stuff certainly. There was a certain amount of work implementing the simple router ("/table/recordno"). GraphQL means throwing a lot of that away, but I can see immediately that GraphQL's approach is what I want. My schema tables (implementing INSERT and UPDATE) look ...