-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathdocumentation_considerations.tex
245 lines (201 loc) · 8.31 KB
/
documentation_considerations.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
\documentclass{article}
\usepackage{hyperref}
\begin{document}
\title{Documentation considerations}
I have studied the use of TeX on documentation. TeX has some
qualities that make me want to diverge from it. It also has
many nice properties I want to retain.
I start with the assumption that the author of TeX made what
made sense back 38 years ago. I have read
\href{http://eijkhout.net/texbytopic/texbytopic.html}{TeX by
Topic} during my analysis.
With the changes I am about to propose, the main motivation
is to lower the barrier of entry to writing documentation. I
want to make the macros within the language so clear they
cannot be missed in large bodies of text. I also want to
make it unlikely that the person uses any of the macro
characters by accident.
Syntax of TeX is deeply ingrained into its function. And the
function can change when TeX processes text. Not by random
entirely, but after a macro. Therefore only TeX program is
sufficient for parsing TeX files.
Since the early days, there are now other software than the
document processor that also wants to parse portions of the
files for various purposes.
TeX has up to 10 control characters, that can make it
surprising to use. The behavior of these characters is not
easy to remember.
\begin{verbatim}
# % & / ^ _ { } ~ $ \
\end{verbatim}
The behavior of / and ~ does not necessarily count as a
control character.
My current design reduces the amount of control characters
to 4. To let this be fair comparison, I included all
characters that have some 'control' meaning here.
\begin{verbatim}
# { } ;
\end{verbatim}
This is intended to avoid plaintext that could 'hide'
macros into itself. I assume it is desirable as we want
the text author to type his text first and format it later.
Since TeX came long before URLs, it doesn't have any
convention to recognize them from the text. An important
property for a text processor today. I want the properly
formatted URLs to be recognized among the text. This directs
my design.
The control character in this new language is hash character
instead of backslash. This is for easy differentiation from
TeX, but also because:
\begin{itemize}
\item
It is extremely easy to spot even in large bodies of
text.
\item
It is easy to type with most layouts. On european
layout the backslash is bit hard to type.
\item
It is used to denote macros or comments in many
modern languages.
\item
There are few places where the hash character could
legitimately appear.
\end{itemize}
One important place where hash can appear places few
restrictions on its use. Mainly, it can be used in the URLs.
To make it really simple, we restrict when hash character
can appear as a macro.
Here is an illustration of how this appears:
\begin{verbatim}
#macro on beginning of line.
after text with space character #macro
after text;#macro
#macro;before text.
in middle;#macro;of text.
\end{verbatim}
The rules here is that if hash appears in the beginning or
after a whitespace, it is a macro. All nonspace characters
except left brace and right brace form the macro.
Since in URLs, the hash character can only appear after a
non-semicolon because semicolon is also a reserved character
in URLs, doing it like this will prevent URLs from ever
triggering a macro.
The spaces around macro are never ignored or left out from
the output stream. If there is a desire to form macro
without spaces before or after it, it is done with placing a
semicolon before or after the macro. In that case the
semicolon gets ignored. Otherwise semicolon behaves like an
ordinary character.
To allow this, the semicolon can be treated with a
lookahead, such that if there's hash character after it, its
meaning changes. Otherwise it is treated like an ordinary
character.
To avoid other macro characters, the hash character without
annything following it produces a hash character. With the
help of the semicolon rule it allows easy escape of the
control character anywhere. Additionally any character can
be escaped with macro in form of unicode notation U+0000 and
two-character hexadecimals. For example: \verb= #20 = would
resolve to space character.
The doubled hash character starts a math-mode and ends it.
For example the \verb=## 1 + 2 * 3 ##= would render
as $ 1 + 2 * 3 $, assuming I got that right in TeX.
There is no desire or interest to diverge from the style of
TeX where it makes sense. Therefore we would provide many
macros like they are in TeX, except with the new syntax:
\begin{verbatim}
#href{example.org}{description of the link}
#label{location:here}
Reference location #ref{location:here}
\end{verbatim}
The braces would behave as grouping element after a macro.
Elsewhere they would be interpreted as ordinary character.
\begin{verbatim}
#begin{itemize}
#item
Item #;1
#begin{itemize}
#item
Subitem #;1
Subitem #;2
#end{itemize}
#end{itemize}
\end{verbatim}
We also introduce the concept of pre -macro. To let
everyone recognize when the given block is not desired to be
processed:
\begin{verbatim}
#begin{verbatim} #pre
Therefore
#end{verbatim}
\end{verbatim}
There was idea that you could make braces around environment
names such as 'verbatim' or 'itemize' optional because
environment names do not necessarily have to contain spaces
in them. There are some implementation details on parsing
that have to be seen before the rules on this can be written.
To avoid confusion with this rule, it requires the
subsequent text to be prefixed right from the line where the
pre -macro appears.
This syntax would then be feasible to read independent of
the system used to process it, while being
simple for computer user to reason about its behavior.
There are several targets and different processors that
would use this syntax:
\begin{itemize}
\item HTML output on the website
\item TeX and PDF documentation
\item Runtime reference documentation that displays
directly on GUI, or is relayed through a debugger
\end{itemize}
Therefore our language would be rather abstract compared to
TeX where we use it. There would be a defined language that
works when translating to all of these targets.
One important part of TeX is its macro language. I have to
still wonder whether this is important piece to implement in
this new language. In any case that language will probably
follow the above rules even then.
Something that might change is the treatment of curly braces.
As implementation note:
I think this syntax could be implemented for TeX with a
package. But if it gets implemented in Lever I will rather
implement a parser that implements the language and then
feeds TeX commands to TeX.
Also the language and syntax could be tested by translating
old TeX documents into it.
Why I started this from TeX, rather than Markdown, HTML or
RST has following reasoning:
\begin{itemize}
\item
I hate them all, I hate TeX less because I've used
it less. I believe lot of written about its good
qualities are true.
\item
HTML, when you truly think of it, is a mediocre
presentation format. It mostly fails as a
document production and processing format.
\item
Long use of markdown in blog posts has shown that it
remains clunky. It also makes arbitrary decision of
what layout it allows the content to describe and
what is described by the website where it goes to.
\item
reStructuredText seem to share many problems of
Markdown, and introduce many of its own problems
while it does so. It is something that pulled me out
of using sphinx documentation system.
\item
Only the TeX in the above list properly supports the
conception that design of layout and visuals should
follow the creation of content and rarely precede it.
\end{itemize}
There were also Texinfo, Patoline, Docbook and Racket scribble
that I studied.
Texinfo, Patoline and Docbook look like birds excrete. They
clearly have had different goals than I have here. Scribble
is bit more into the same direction as this, but bit too
fixated to the lispiness. Therefore they are taking slightly
different approaches.
If you create a subset of TeX that has structures that can
be understood by all your document processing targets, it
could be very versatile platform for writing documentation.