-
-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nested link and orphan characters #431
Conversation
@gzagatti on previous PR you wrote this:
Can you give me a better example of this? I don't seem to understand what is actually nestable and what is not.
Second Also, second line has a bolded verbatim, which means verbatim is nesting inside the bold. I don't think we need to do all this tracking of nested entries, we just need to figure out what to clear or not in this loop https://github.com/nvim-orgmode/orgmode/blob/master/lua/orgmode/colors/markup_highlighter.lua#L320-L328 . When two entries are matched, this loop clears all "opened" markers that are found between them. If we figure out what we can leave here, other changes might not be necessary. Of course, for links we need to do some additional work, currently I'm talking about markup. |
I did a more detailed comparison with Emacs, and I noticed few things:
Also, on line 28, if you give it some spacing around I think we can solve some of the nested highlights (like verbatim inside latex) by setting a proper priority to a extmark. LateX should always have greater priority over verbatim and code, but not greater priority over other markup. |
Thanks for the detailed comments. First let me go back to the syntax specification. It says that a text markup is composed of the following objects:
None of the elements above are separated by whitespace characters. The pattern
Now let me try to tackle some examples to illustrate some points --- all of the images are obtained after exporting the
Now, let's pick a nestable markup:
Moving on to LaTex markup. Here the specification says that a LaTex fragment should follow on the the patterns below:
None of the elements above are separated by whitespace characters. But note that Fragments like
The reason why numeric characters are excluded is because in LaTex macros can only be composed of alphabetic characters. The LaTex wiki states:
Next, concerning Now, we can consider some additional examples --- all of the images are obtained after exporting the
Finally, there should be no priority of verbatim inside of LaTex because is non-nestable. So what appears as verbatim in LaTex fragment should be parsed as plain text. The same applies for LaTex in verbatim, and for any markup inside non-nestable markups. In summary, Emacs parses some in-line markup incorrectly, specifically it fails when there are overlapping markups like:
The specification says that the content of a nestable markup should contain elements of the standard set of objects, i.e. it can contain nested org content. It is thus ambiguous with regard to the above construction. I would argue that all nested markup should be closed within its parent. So, to obtain the desired result above we should have:
Fortunately, all I hope that this clarifies most of the points you raised above. Let me know if anything else is unclear. I will then try to tackle:
|
Thank you for the detailed explanation and examples. One sentence caught my attention:
Does this mean that html export is more correct than Emacs itself? I'm trying to figure out what source should we trust the most?
My thought was that first there was an Emacs implementation, from which syntax specification and exports were created. I did try to export some of these into PDF via org-latex-to-pdf, and results are similar, so it seems that Emacs does really have less correct highlighting than others. |
I am also not very familiar with Emacs/Orgmode history. I have been using it for about a year now and have converted all my notes to According to the authors of the syntax specification:
My interpretation is that the syntax tries to follow the export rules as close as possible and should be the source of truth for developers. Therefore, I would set my trust in the following order: (1) syntax specification, (2) exports and (3) Emacs. Also, according to Therefore, when the specification says that the content of a markup can be a series of objects from the standard set, it means that all elements must be fully contained in the markup. When writing this, I have thought about these additional corner cases:
I think I should be able to tackle them by tweaking this PR. |
Ok, lets follow the syntax specification then. It does make more sense.
Would you be willing to do some refactorings, that I also did as part of #432 ?
I think if we do things per line we could potentially handle nesting and conflicts more easily. Basically in your edge case example: If you think it's not possible to do it that way, lets just try to avoid |
I will try to follow your suggestions and see if I can implement until next week. |
I used to solve a similar problem with brackets
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some refactoring to the markup highlighter file to make it a bit more readable. It was a bit hard to read due to all types being in single loop.
Please adapt your changes to it, and remove the highlight group changes to headlines.
@@ -1196,14 +1196,6 @@ For adding/changing TODO keyword colors see [org-todo-keyword-faces](#org_todo_k | |||
* `OrgTSCheckboxChecked`: A checkbox checked with either `[x]` or `[X]` | |||
* `OrgTSCheckboxHalfChecked`: A checkbox checked with `[-]` | |||
* `OrgTSCheckboxUnchecked`: A empty checkbox | |||
* `OrgTSHeadlineLevel1`: Headline at level 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets leave all these highlight changes for a separate PR.
setlocal foldmethod=expr | ||
setlocal foldexpr=nvim_treesitter#foldexpr() | ||
setlocal foldtext=OrgmodeFoldText() | ||
" setlocal fillchars+=fold:\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be uncommented
@@ -213,6 +213,15 @@ local function load_deps() | |||
vim.treesitter.query.add_predicate('org-is-valid-latex-range?', is_valid_latex_range) | |||
end | |||
|
|||
-- a function that splits a string on '.' | |||
local function split(string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a vim.split
helper function that you can use instead.
9094379
to
abff0dc
Compare
Latest master changes should address all of these issues. If you find anything else please open up a separate issue. thanks! |
This PR fixes issue #430.
To fix this issue, I do not allow nested markups in regular links. However, in the future we need to deal with markup on the description of the link as per the syntax specification.
Second, I created a list of orphan characters that might appear inside of a markup that does not allow nested markups. Once out of such markup, I skip any orphan character with an end capture.
Let's say we have the following:
Then Treesitter would collect the following:
=and *boo and=
,*boo and= bar*
and*goo zar*
. In this case, we should skip*boo and= bar*
.Similarly, if we have:
Then Treesitter would collect the following
=a *1 b=
,*1 b= and *
and*ab cd*
. In this case, we should skip*1 b= and *
.Feel free to test against additional corner cases. All highlights are correct in the file below: