-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow documents to be divided into chapters and sections #26
Comments
If by "will probably require messing with the existing models" you meant On Fri, May 16, 2014 at 3:45 PM, Ari Entlich [email protected]:
|
Here's some analysis of how the old site is structured. There are three tables involved:
Each row in Each row in Each row in |
The tree structure of chapters and subchapters seems to be derived from the names of the sections. For example, in the following sections from Hamlet (work id 5):
|
https://github.com/stefankroes/ancestry - gem i've used a bit and consider as a possibility here |
I don't feel I fully communicated what I was thinking yesterday in terms of a potential proposed change to the documents structure, so I want to communicate that better to get things ironed out. What I propose is that section represents all the sections for a document. The structure would look something like this: section (id: 1, ancestry: nil) Then, instead of a documents model, introduce a Metadata model (subject to different name), which has many of the fields document has, plus a section_id field, as it would belong to a section. In my example above of a single document, there would only be one metadata entry that section_id is equal to 1 because it only belongs to that top item. The benefit of my approach, IMO, is a two fold: Another benefit of this approach is then, on the single document view page, we could call that top level section children to get those immediate children (ie section > section). Thoughts? |
I suppose we could also keep the existing documents data model as is and simply add a section_id to it so that it would be the acting "Metadata" model, and then we would create a root section instance for each document instance. |
I definitely agree that it makes the most sense for the root node of the ancestry tree to represent the entire work. I just did a quick google about having multiple models in an ancestry tree, and came up with this: stefankroes/ancestry#155 question on github. The suggestion given was to use single-table inheritance. Since all nodes would be in the same table, their ids would all come from the same "id-space", and ancestry would work as expected. Here's a proposed class hierarchy. Let me know if I'm overthinking this. Names are subject to change, of course.
So the root of the tree would be a Work, sections which include other sections, such as a chapter, would be a Section, and a section which contains content would be a ContentSection. I'm not sure if there's a way to enforce certain invariants, such as that Work has no parents, ContentSection has no children, etc. Perhaps this could be done with validations? |
I don't see how that github question / answer / reference to STI helps solve what we are trying to do. The reference to STI is implying we are saving a type to specify what type the ancestry is, but I don't see how has_ancestry accommodates that type based on what I know of has_ancestry. Your proposed data model has 4 different models, which seems overly complex to me. I understand the separation between Section & ContentSection in that you are proposing that the single responsibility of the Section table be to store the ancestry, while all the other data for the content is stored elsewhere, but IMO, DocumentNode & Work don't need to be separate models. I would go back to the following data model, to keep it most consistent with what we have now:
I think there are really multiple routes to take here and there is no "best" answer, ie pros & cons to various approaches. I prefer the more simple approach as I've described because of how I know this will integrate with an admin interface and be manageable. |
My understanding is that the type of a row would specify the type of that row. I made DocumentNode and Work separate so that only the root node would contain the metadata for the overall work. |
Ok, sorry I mispoke about STI. Here's a quick blog post I referenced: http://maulanaruby.wordpress.com/2007/02/17/sti-vs-polymorphic-association/. Even if STI is used for has_ancestry, I'm still unclear as to how to use has_ancestry with it. In your model, how is STI applied? I'm still not seeing it. |
The approach / question I think I'm trying to get at here, is how important is it to utilize an existing Admin framework (activeadmin). The more complex, STI / polymorphism included, the more custom admin interface will need to be built out b/c I do not expect it to support a complex data model. In my mind, I see a huge value in leveraging an existing admin interface and erring on the side of simplicity. |
Ok, perhaps I'm making some assumptions here. Here's my understanding of how this would work. Let me know if I'm off-base. Essentially, the methods that ancestry adds to a model just do various parsing and querying based on the ancestry paths. From these, it figures out an id or a list of ids that the user is interested in. Them I'm assuming that it would ask activerecord to fetch these ids from the database, and I'm assuming that activerecord would then instantiate the correct model based on the type field. ancestry would not have to have any idea what the type field means or which actual models are used for any of the objects it is fetching. I don't think the admin interface would look much different with single-table inheritance. It doesn't look like the admin interface handles ancestry in any special way at the moment - it just treats the ancestry field as a text string. What would the admin interface not be able to handle that it handles now? |
Here's a nestable plugin for activeadmin: https://github.com/nebirhos/activeadmin-sortable-tree. Those are all correct assumptions, except that I don't believe has_ancestry provides the ability for you to specify which type of model it's pulling the instance from, meaning has_ancestry always looks in the same model for the id. This tells me that we cannot have a Section instance reference any other type of model as the root node. |
I was thinking that you would put the has_ancestry in the DocumentNode and then it would always do DocumentNode.find(num), for example, and activerecord would then instantiate the correct model. Where I'm coming from with this more complex proposal is that I tend to like for the storage of the data to reflect its structure. This allows me to be more sure of the integrity of the data. The downside is, of course, the complexity, and also that changing the structure can be difficult if that becomes necessary. |
@atrigent Unfortunately we're working with Rails which doesn't believe in database logic. It tries to pull as much logic into the application as possible. In theory it allows them to swap out database backends more easily instead of relying on a universal ORM that also understands database schema. Also unfortunately, we're working with Rails and we want to follow convention for future folks. If the Gem manages the database structure, then we should let it, even if it does so poorly. If you've ever worked with Drupal, you'll be quite happy with what Rails does manage. |
So it sounds like we do what @stephskardal was suggesting, and use https://github.com/mbleigh/acts-as-taggable-on to add any additional metadata (in an integrity-checked but flexible way) if necessary to support the concerns of @atrigent |
Some updates / proposed data model:
ActiveRecord Callback: Upon creation of Document, a root Node is created. Validations needed:
As @btbonval suggests, we can uses acts-as-taggable-on for additional unstructured metadata. |
I used the following lovely query:
to verify that the relationship between sections and contents is 1:1. There's probably a nicer looking query for doing this, but I think this works, so oh well. The result was:
This shows that a section only ever has zero or one contents attached to it. If it has zero it represents a section with subsections, otherwise it is a section with content. As expected, there are more sections with content than sections with subsections. |
That's a good result, @atrigent , but remember we were thinking that chapters, sections, and subsections might have all been handled in title. Do we want to perform a more accurate reflection of the document structure or simply mirror what the old database did? I'd prefer we do our best to reflect the true document structure, but then again, that means we'd have to figure out if there is consistency in titling and it means we'd have to hand check a whole lot more. |
I'm looking into that now. |
Oh wow, ok. Apparently this table exists:
I'm not sure how I managed to not notice this before. Looking into its properties now. |
Easy to miss. There are a lot of tables and a lot of PHP files :( That appears to be the relation I was looking for when we were discussing On Tue, Jun 10, 2014 at 10:42 AM, Ari Entlich [email protected]
|
As you might expect, every section has 0 or 1 parents:
|
That makes sense. What about layers? SELECT count(*)
FROM section_parents AS grandparent, section_parents AS parent
WHERE grandparent.child_id = parent.parent_id; Let's see if there is any case of a hierarchical nesting, or if it's all just parent/child. |
2341 rows, which seem to be spread out over 38 works. |
There are some interesting cases here - for example, it looks like some subsection names are not in fact prefixed by their parent's names. Here's an example: http://www.thefinalclub.org/work-overview.php?work_id=29 . You can't tell from this page, but if you click on some of the sections it will show you their full names. |
Yeah, it seems like the name doesn't imply anything. The structure is entirely contained in the |
In the interest of verifying basic invariants: every pair of sections linked by the
|
I would not have thought to check that, but that's good science. It looks On Thu, Jun 12, 2014 at 10:13 AM, Ari Entlich [email protected]
|
The old thefinalclub website had a concept of "works" that were divided into "chapters" that were subdivided into "sections". This allowed large works (books, plays, etc) to be used on the website, because only a certain section would need to be shown to the user at any given time. Showing an entire work on a single page would require a lot of data transfer and would be a strain on the user's browser.
Annotation studio only has a concept of "documents". When creating a document, annotation studio allows the user to specify the code for chapter navigation, but does not store chapters in a structured way and requires the user to write HTML code for the navigation. An entire document will be displayed on a single page.
We need a structured way to break works down into chapters and sections. This will probably require messing with the existing models and creating some new ones.
The text was updated successfully, but these errors were encountered: