Separating content from formatting

May 27, 2020

In exploring the essential Qt widget for the MPV, the text “area” (known as QTextEdit or QPLainTextEdit), I was immediately confronted with the issue of formatting. This is because, QTextEdit gives the option of editing and extracting the HTML used to display the text. Which led me to figure out that Qt uses a subset of HTML 4 to display rich text in all widgets that support it.

I hadn’t thought about the execution of formatting, because I’ve been primarily focused on issues surrounding the content—which is primarily textual. Issues such as quotations, links, revisions, etc. And because earlier this early this year I had already decided that I would prefer to use Markdown as my preferred way to add formatting, I had only been thinking about formatting that was actually incorporated within the text (vs. formatting that is applied to the text, but is otherwise hidden). This led me down some rabbit holes, and even to go fishing again for GUI frameworks (as if I hadn’t already spent literally days, if not weeks, on that research).

What I took away from all this is the following:

Text is the primary source of our knowledge building blocks (as language is the primary building blocks for pretty much every advanced “thing” we have over our closest primate relatives). As such, the primary code/functionality for the PKB (and future projects) should be focused on the text.
Formatting is useful for highlighting, delineating boundaries (sections), adding emphasis, etc. However, it is a veneer applied to text. I think few would argue that, for instance when tracking changes between revision of a document, that the formatting changes are as important to capture (I know for myself that I often just bulk accept these changes, although I know some companies, such as one I worked for, obsess to an insane degree over formatting).

Given these two take-aways has helped me to think about where to place the formatting, and how to use it.

The “body” of the knowledge unit should be the text.
- To allow for some of the basic features (such as bidirectional links, see-through quotations, etc), and more interesting features (such as revisions on a paragraph, or smaller, level), this will require some data structure to encapsulate this in the model object, before it is handed off to the view.
Formatting is wholly apart from the content, and can be considered more like a property of the knowledge unit.
- To allow for tags and other meta information, it was already planned for to have a dictionary of properties that any unit can take on. Formatting should probably live there.
- If the formatting is no longer interleaved with the text, it does present some complications that I can foresee right now, primarily, making it much harder to move and align formatting to the correct text.
  - One possible remedy for this is directly related to how I’m thinking the underlying text model will have to be stored. If the text for future revisions is based on a list consisting of pointers to previous revision sections and fresh text, then it might be possible to also incorporate markers of where formatting begins and ends—completely below (above?) the text itself. The unresolved issue with this idea is if it is possible to do this without giving away the data structure to the view. Basically, it would be nice to have an intermediary that gives the text and formatting information to the view to display. However, since the view is also where the text is being edited, that means either every key press, cut/paste, and mouse rearrangement, will have to be transmitted back to the intermediary to update the underlying model and view, or the view itself will have to know the underlying structure.

TL;DR: you can learn a lot about details you didn’t think of, when you actually put the rubber to the road and get building.

Twitter Facebook LinkedIn

Adam Jones