Skip to main content

Why table join doesn't work

Everything in the traditional relational database world is based on tables. If I have a table of Classes and a table of Students, I can have an association table that tells me who's in which class.

classIdclassNameinstructor
1Psych 101Dr. Jung

studentIdfirstNamelastName
1MollyRingwald
2AnthonyHall
3JoanCusack

Clearly, these two tables tell me nothing about which classes Molly is taking, nor which students have signed up for Psych 101.

An "association" table provides that information:
classIdstudentIdAssociation-specific notes
11
12late add
13

In the relational world, a LEFT JOIN gives us the student list for a given class:

classNameinstructorfirstNamelastNameAssn notes
Psych 101Dr. JungMollyRingwald
Psych 101Dr. JungAnthonyHalllate add
Psych 101Dr. JungJoanCusack

This approach is "pure" from a Chris Date perspective, but when we're talking about JSON data in response to a RESTful HTTP request, it seems quite burdensome. Rather than invent a tabular compression scheme that allows table cells to indicate "same as previous," we want to take advantage of the fact that JSON (and PHP) allows table cells to include arrays.

Instead of the final table as:
[
 { "className": "Psych 101", "instructor": "Dr. Jung", "firstName": "Molly", "lastName": "Ringwald", "Assn notes": "" },
 { "className": "Psych 101", "instructor": "Dr. Jung", "firstName": "Anthony", "lastName": "Hall", "Assn notes": "late add" },
{ "className": "Psych 101", "instructor": "Dr. Jung", "firstName": "Joan", "lastName": "Cusack", "Assn notes": "" }
]

We do it this way:
{ "className": "Psych 101", "instructor": "Dr. Jung",
   "students": [
    { "firstName": "Molly", "lastName": "Ringwald" },
    { "firstName": "Anthony", "lastName": "Hall" },
    { "firstName": "Joan", "lastName": "Cusack" }
    ]
}

CRUD (Create, Read, Update, Delete) operations on such a many-to-many database are interesting. Only delete needs to mutate the association table. In general, Classes and Students have non-overlapping CRUD. We can Create or Update a class or a student dealing only with that table. When we Delete from either table, we have to Delete from the association table, as well.

Read could be limited to a single table, but we have just demonstrated how we plan to use arrays to report relevant records from the secondary table. We can look at classes to see a list of students taking each class, or we can look at students to see lists of the classes they are taking.

We want to extend between primary and secondary tables in the same way, perhaps, in order to establish and drop these associations. There is no point updating an association, and reading is covered using arrays as described. So, we have Create and Delete.

Create is tantamount to "Record that this Student is now taking that Class." Delete is the same: "Record that this Student is no longer taking that Class." There is never an instance in which cancelling a class expels students from school entirely. We're only talking about creating or deleting this relationship.

We could indicate this using classId and studentId:

CREATE (a relationship between) classId, studentId.

This is akin to CRUD on the association table itself. As such, it involves the UI programmer too much in the underlying mechanism. Also, as a practical matter, the UI is usually looking at one master table or the other--a class and its students, or a student and their classes. To add a student or a class, we would have to magically know its id.

The solution is that, when looking at a Class record, the interface includes a StudentPicker that converts a student name entered into a concrete studentId. We already know the classId.

It seems reasonable to expose an asymmetric facility: You can add students to classes, but you can't add classes to a student. There is no technical reason why it can't work both ways, but the simplification is easier to explain and seems like common sense.  Looking at a class roster, a picker helps you find one of all the enrolled students. One student is the same as another--all people with names--so you begin entering the name and you find it. Classes, on the other hand, might be named oddly and are clustered around categories and fields of study. Molly can't just start typing "Brain Science" to find "Psych 101."

But, we conclude that our Association C--D takes two ids and perhaps an association note. There is no reason to specify names. However, the response to Create should be more verbose so that the UI can display information about, in this case, the new student. If we POST a new student, the response is the student record. Likewise, if we POST a new student in a class, we only provide the studentId, but the response should be the student record. A rich UI might display the photo of the student added to the class.





Comments

Popular posts from this blog

CSS for Tables

Tables still have a place--for tabular data. Duh. Such as the Companies table in FDB. But I had to remember the CSS around tables. First, the basic structure: table   thead     tr       th   tbody     tr       td Table border-collapse: { separate (default) | collapse } border-spacing: { #both | #horiz #vert } - default is 1px empty-cells: { show (default) | hide } table-layout: { auto (default) | fixed } - fixed is like !important for widths For a responsive table, put it inside a container (e.g., div) with overflow-x: auto; Width, height, border can be applied to table, th and td--not tr, thead or tbody. Cells th and td tags. CSS doesn't seem to like naked th and td. Prefer table td or table th selectors. text-align: { left | center | top } vertical-align: { top | bottom | middle } padding (margin doesn't do anything; use border-spacing) border-bottom: - for just horizontal lines between rows Rows For a mouse-over to select whole rows at a time: tr:hov

A JSON Db Product?

The last post "solved" the problem of many-to-many table joins by papering over the association table with a RESTful JSON interface. As long as we're using JSON, we might as well take advantage of multi-valued table cells. I'm naturally wondering where this leads. JSON identifiers and types and SQL identifiers and types overlap so much that their intersection is a useful subset. Camel-case fields in string, number, bool flavors. Many-to-many occurs often in the world: Students in Classes Actors in Films (musicians on recorded songs) Parts in Assemblies Customers and Products (joined by Orders) The generalized description is that a Table requires a unique identifier for each row. Tables list students, actors, films, customers, and so on.  An Association Table is has two or more foreign keys that match unique identifiers in other tables. The knowledge of how a FK maps to a specific Table is baked in--we wouldn't want a "table name" column.

GraphQL is the many-to-many solution

Exactly! Regular readers of this blog (me) will appreciate my stumbling attempts to pre-define a REST interface that supports many-to-many interfaces. GET a class, for example, and the return includes an array of the students in that class. In this context, we don't want a full Student record, just the Student's name and Id, for example. With a REST interface, the server writer has to guess how to abbreviate the Student record. GraphQL fixes that. The front end requests just the data it wants. If we want a list of the students in a class and the assigned roommates for that student...we can do that! A lot of my prototype REST service is hardwired--not single tables, so much, but the many-to-many stuff certainly. There was a certain amount of work implementing the simple router ("/table/recordno"). GraphQL means throwing a lot of that away, but I can see immediately that GraphQL's approach is what I want. My schema tables (implementing INSERT and UPDATE) look