Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should an index be visually displayed? #35

Closed
finanalyst opened this issue Jun 6, 2024 · 14 comments
Closed

How should an index be visually displayed? #35

finanalyst opened this issue Jun 6, 2024 · 14 comments

Comments

@finanalyst
Copy link
Contributor

@thoughtstream there are several X<DISPLAY | index options> markup options, including hierarchical ones. Pod::To::HTML never created an HTML representation of the index. I wondered what your original intent was.
It is clear that in the text, only DISPLAY is rendered, so the question is about the index itself.

In particular, I was wondering about multiple hierarchies, eg X<D|Main,2nd-elem,3rd-elem>

In addition, not all outputs (eg text and MarkDown) support links to arbitrary places in text (MarkDown only supports links to headers AFAIK). Consequently, I also include pass to the index generator a value place, which is the most recent heading. When a target is possible, then the term in place would contain a target at the position of the indexed item, not to the heading.

Here are some examples (DISPLAY is shortened to D as its not at issue here), ===> means a line in the Index

  1. X<D|Simple> ===> Simple : place
  2. X<D|Simple;Complicated> ===> Complicated: place
    ===> Simple : place
  3. X<D|Simple;Complicated,hierarchy> ===> Complicated
    - hierarchy : place
    ===> Simple: place
  4. `X<D|Simple;Complicated, leading, trailing> ???

The above are assumed to be unique lines in a text. Repeated instances would be concatenated:

=head First

Here is one paragraph about X<interesting | Gloss> writing.

=head Second

another para about X<less interesting | Gloss> cuneiform>

Would yield (I think)

Index
Gloss: In section First
       In section Second

But what about repeated 1st elem with multiple 2nd elem? eg

some text X<item 1|Gloss, interesting>
more text X<item2 |Gloss, less so>
@thoughtstream
Copy link

My concept was that index entries form a kind of anchor that is visible
only to the index generator (if any) of a renderer.

Renderers do not have to provide an index, but if they choose to do so
then each entry must provide some kind of reference to the original X<> marker,
as precisely as the rendered format allows.

It is acceptable for a renderer to represent that precise reference
as any of the following:
- a page number of a book,
- a line number in a plaintext,
- a link to an anchor on the rendered content of the X<>,
- a link to an anchor on the heading preceding the index entry,
- the name of the heading preceding the index entry.

BTW, MarkDown does permit the specification of anchors at any arbitrary point in the text,
because it specifically allows span-level HTML tags at any point, so:

Here is one paragraph about X<interesting | Gloss> writing.

...could be rendered to Markdown like so:

Here is one paragraph about <A id="X123">interesting</A> writing.

...and then the index entry like so:

### Index

Gloss: [123](#X123)

(where the X123 is obviously a unique identifier for that particular X<>)

As for how I envisioned index entries being rendered, here is a suggested format
(which is, of course, not mandated). Given the following RakuDoc:

=head3 Adding index entries to your text

An X<index entry|index, entry> is an inline X<formatting code|formatting code;inline formatting> that
is rendered normally (i.e. with no special identifying styling) within the text, but which is also added 
to the X<index>. X<Index entries|index, entry> may be specified with X<subentries|index, subentry>, including 
X<multilevel subentries|index, subentry, multilevel>, though a renderer is not required to represent anything 
more than the X<first level|index, subentry, rendering>. A single index entry can specify 
X<two or more separate entries in the index|index; index, multiple entries; index, entry, nested>,
all of which will refer back to the same point in the text.

I would expect something like one of the following...

For a plaintext renderer (rendering links as line numbers, where the =head3 is at line 536):

INDEX

formatting code: line 538
index: lines 540, 543
    - entry: lines 538, 540
        - nested: line 543
    - multiple entries: line 543
    - subentry: line 540
        - multilevel: line 541
        - rendering: line 542
inline formatting: line 538

For a book or PDF renderer (rendering links as page numbers, where the rendered paragraph starts on page 214 and flows over onto page 215):

Index

formatting code: 214
index: 214, 215
    - entry: 214
        - nested: 214
    - multiple entries: 215
    - subentry: 214
        - multilevel: 214
        - rendering: 215
inline formatting: 214

For a renderer to MarkDown:

### Index

formatting code: [137](#X137)
index: [284](#X284), [285](#X285)
&nbsp;&nbsp;&nbsp;&nbsp;- _entry: [672](#X672), [691](#X691)_
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- _nested: [989](#X989)_
&nbsp;&nbsp;&nbsp;&nbsp;- _multiple entries: [666](#X666)_
&nbsp;&nbsp;&nbsp;&nbsp;- _subentry: [421](#X421)_
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- _multilevel: [666](#X666)_
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- _rendering: [887](#X887)_
inline formatting: [137](#X137)

...which would then produce something like:

Index

formatting code: 137
index: 284, 285
    - entry: 672, 691
        - nested: 989
    - multiple entries: 666
    - subentry: 421
        - multilevel: 666
        - rendering: 887
inline formatting: 137

As usual, I'm happy to discuss this further, if I haven't been sufficiently clear.

@finanalyst
Copy link
Contributor Author

finanalyst commented Jun 7, 2024

@thoughtstream Much clearer now
Thanks for MarkDown hint. I knew that MarkDown allows HTML, but I hadn't thought of using HTML for anchors. That is going to be very useful for Footnotes which need backward links, and using saner id reference for headers.

Line numbers ??

  1. Are we talking about the line number of the input source, which may be very different from the rendered output?
  • If we were, there's a problem because neither old Rakudo, nor RakuAST provide line numbers for RakuDoc
  1. Any suggestion about how to calculate line numbers for a text that wraps to conform to the output format?
  • this was the reason I developed the place mechanism as described above.

EPub has no native page numbers
PDF files retain the page numbers of printed books

Apparently Kindle has some form of 'location number', but information about Kindle formats seems proprietary.

Options that I can see:

  1. The place mechanism (but it is not granular)
  2. Calculating a virtual line number based on a notional Width for a page, eg. 80 char.
  3. SImilarly a notional page based on Width and line height for a page, eg 80 x 27.

I found "Theory says:
72 pts per inch
at 12 pt, that equals 6 lines per inch, double space = 3 lines per inch
11 inch page - 2 x 1inch margin = 9 inch
9 inch x 3 lines per inch = 27 lines per page, in theory, on my manual typewriter.
" citation

@zag
Copy link
Member

zag commented Jun 7, 2024

Sorry if I'm interrupting the conversation.

Page numbers are used only in printed books ( or PDF) and don't make sense for web content. For web content, URLs are used.

I would suggest using document titles (or their abbreviated forms), file names, or URLs (or their shortened versions), or something that characterizes the resource along with the term.

PS: In EPUB format, page numbering is handled by the Reader and depends on user preferences (such as page size and font
thank you

@finanalyst
Copy link
Contributor Author

@zag Not an interruption! Just a confirmation :)
Problem is how to refer to content inside the same document when there are no anchors, such as in text.

PDF has the same problem really. Except that PDF tends to be used for content that originally has a hard-copy. So PDF includes markers that correspond to the hard-copy. PDFs of the same book by different publishers will have different page numbers, even if the content is otherwise the same.

It is a subtle problem for ebooks as well. How, for example, does a scholar reference a paragraph in an ebook when there are no line or page numbers?

I looked at various sites that measure words/lines/pages of content. These are needed because many assignments are stated as 'write 3000 words about ...' or 'write a 2 page article on ...'. Authors now work exclusively on computers. So the people giving assignments use the algorithms provided by (eg) MS Word.

Essentially, these breakdown to a calculation based on notional values of words/line word/page.

@zag
Copy link
Member

zag commented Jun 7, 2024

You can use the paragraph number (or block number) on that page. .....123, 124
( can be prefixed with some kind of prefix)
If there are several terms within a block, you can simply add a sequential number: 123-1, 123-2.
thank you

@zag
Copy link
Member

zag commented Jun 7, 2024

Oh, I missed an important detail: X<> is converted into an inline element like a span with an HTML ID and then used for addressing. This was already mentioned earlier, though.
thank you

@finanalyst
Copy link
Contributor Author

@zag block level tagging is essentially what I meant by the place mechanism mentioned above.

@finanalyst
Copy link
Contributor Author

The problem is quite old, and for pre-printing press books, such as the Bible, different copiers would order text differently on pages. So citations were made by Chapter & verse.

So, perhaps Section Name, para No, or § Name, ※ number would be the most generic?

@finanalyst
Copy link
Contributor Author

Additionally, I'd like to keep count of paragraphs in a block scope so that I could easily implement a numPara block (I'm planning on numFormula and numTable as additional custom blocks to match up with numhead, numitem, numdefn, which are specified).

@thoughtstream
Copy link

thoughtstream commented Jun 8, 2024

Line numbers ??

1. Are we talking about the line number of the input source,  which may be very different from the rendered output?

No. I meant line number in the output plaintext.

2. Any suggestion about how to calculate line numbers for a text that wraps to conform to the output format?

I would not use line numbers in that case. I simply meant that, for a plaintext output format,
in which no active links are possible, and there are no page numbers for the index to use,
the only obvious option was to use the line number of the rendered context of the X<>.
I agree that you could also use paragraph number. Though, when paragraphs are not explicitly
numbered, line number seems much easier to jump to when the output is plaintext (I'm thinking
in terms of the typical practice of running the rendered plaintext through less or more,
in which case you can jump straight to a specified line with G).

EPub has no native page numbers
PDF files retain the page numbers of printed books
Apparently Kindle has some form of 'location number', but information about Kindle formats seems proprietary.

Options that I can see:
1. The place mechanism (but it is not granular)
2. Calculating a virtual line number based on a notional Width for a page, eg. 80 char.
3. SImilarly a notional page based on Width and line height for a page, eg 80 x 27.

Both EPub and PDF support internal links within a document, so I don't think page numbers
(or virtual page numbers) are necessary or appropriate for those output formats.

I'd prefer the specification not to mandate any particular representation of the index,
or any particular behaviour, or even that indexes must be supported at all. I'd prefer to leave
the authors of renderers free to choose the best mechanism (if any) for their output format.
I'm fine if the specification were to suggest some possible ways an index may work,
but even there I don't feel it's necessary.

Personally, I think if the output format supports active internal links
(e.g. HTML, MarkDown, EPub, PDF, RTF, DOCX, etc.),
then the index should be specified in terms of such links.
Otherwise, if page numbers, paragraph numbers,
or line numbers are feasible, they can be used.

But we need to avoid locking the authors of renderers into
any requirement for an index, because we cannot predict
what output formats may eventually be supported.
For example, active links or passive numbers
would be useless for a renderer to an audiobook format.
You'd need something like a timestamp instead.

@thoughtstream
Copy link

[Oops, something went very wrong with that last comment, which was cut off and published prematurely. Sorry.
Will repost in a few minutes.]

@thoughtstream
Copy link

Additionally, I'd like to keep count of paragraphs in a block scope so that I could easily implement a numPara block
(I'm planning on numFormula and numTable as additional custom blocks to match up with
numhead, numitem, numdefn, which are specified).

Hmmmmm. Do we need to open a separate issue about this?

Looking at the available standard block types, it seems that
they fall into two categories: those that produce a visible renderings
and those that are effectively invisible in the rendered output:

Visibly Rendered Invisibly Rendered
=code =comment
=input =nested
=output =rakudoc
=head =section
=defn =pod
=item =data
=para
=table
=cell
=formula

It seems reasonable that every block that produces a visible rendering
should have a corresponding num... version.

On the other hand, I don't think that we can specify that absolutely any block
(visible or not) can be given a num... prefix. If I squint, I can see that
=numsection or =numrakudoc (and even maybe =numnested)
might make some kind of sense...but =numcomment is just silly,
and =numdata positively misleading`.

@finanalyst
Copy link
Contributor Author

@thoughtstream Yes we probably need another issue for num. I'm opening one.

@finanalyst
Copy link
Contributor Author

closing this as resolved. Conversation about numbered blocks continued in #36

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants