Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide range or bias direction for go-to-definition etc #1029

Open
sam-mccall opened this issue Jun 22, 2020 · 22 comments · May be fixed by #2027
Open

Provide range or bias direction for go-to-definition etc #1029

sam-mccall opened this issue Jun 22, 2020 · 22 comments · May be fixed by #2027
Labels
feature-request Request for new features or functionality goto
Milestone

Comments

@sam-mccall
Copy link
Contributor

Go-to-definition (and others) is triggered on a cursor position. In VSCode, the cursor lies between characters, so ambiguity: the thing on the left or the thing on the right?

A couple of factors make this worse:

  • having a selection resolves the ambiguity, but the language server doesn't get to see it. In VSCode a selection usually requires biasing left.
  • other editors place the cursor on characters and so have no ambiguity. They always need the language server to bias right.

To illustrate, these screenshots produce the same textDocument/definition request, but want different results.

In vim. This should go to Foo::operator->.
image

In VSCode. This should go to object.
image

It'd be nice to have some more information so language servers can make better decisions. Some ideas:

  • requests provide optional selection range in addition to cursor position. (block-cursor editors might always provide at least a 1-char range)
  • a client capability describing the cursor type. (Language servers could implement heuristics like bias-towards-identifier for cursor-between-char editors, and have the correct behavior for block-cursor editors)

I guess this is mostly an issue with languages that have overloaded operators. constructor and functor calls are common examples in C++.

@sam-mccall
Copy link
Contributor Author

(Just noticed that the highlighting in the second screenshot shows the problem nicely.
VSCode's built-in identifier highlighting marks object on lines 2 and 3, while clangd's textDocument/highlight marks operator on line 2 and -> on line 3. I think not including the -> on line 2 is a clangd bug...)

@rcjsuen
Copy link
Contributor

rcjsuen commented Jun 22, 2020

@dbaeumer
Copy link
Member

As mentioned in microsoft/vscode-languageserver-node#593 we need to change the protocol if we need to let servers know about this. Adding a selection to the params might not help since VS Code doesn't have the information right now in the API and other editors might behave the same. So having a capability looks like a good approach to have more control over this.

@dbaeumer dbaeumer added the feature-request Request for new features or functionality label Jun 23, 2020
@dbaeumer dbaeumer added this to the On Deck milestone Jun 23, 2020
@puremourning
Copy link

In VSCode, the cursor lies between characters, so ambiguity: the thing on the left or the thing on the right?

That's a VSCode issue, not a protocol issue, though right ? The protocol is explicit about which "character" is being requested (it's a row/column position, not a position bettween two columns). Why would we need to change the protocol for this? Surely this change can be made in VSCode only.

Or am i missing something important?

@sam-mccall
Copy link
Contributor Author

Adding a selection to the params might not help since VS Code doesn't have the information right now in the API and other editors might behave the same

Makes sense to me. A capability seems less complex overall, with most of the complexity lying in the server rather than client or protocol.

The protocol is explicit about which "character" is being requested (it's a row/column position, not a position bettween two columns).

I think it's the opposite:

A position is between two characters like an ‘insert’ cursor in a editor

Whatever the spec text currently says, both "line-cursor" and "block-cursor" editors exist, and it would be nice to specify something here to improve interop.

@puremourning
Copy link

I think it's the opposite:

Quite right, ugh.

value represents the gap between the character and character + 1.

Oh boy, i had not noticed that. So, how do you specify the start of the line ? 0 would mean the gap between 0 (the first character) and 1 (the next character), where it's really the gap between the "theoretical" start-of-line and the 0th (first) character.

On reflection, this model seems kinda wonky. For completion requests the character position is usually requested as the position immediately following the last char on the line, but this would indicate that the server should interpret is therefore as to between the "one-past-the-last-char-in-line" and the "two-past-the-last-char-in-line". Fortunately, either we (and most other clients) are not actually not sending the 'one-past-the-last-char-in-line" position or servers are already interpreting the completion position as if it were an actual position (not a virtual thing between 2 positions). I suspect the later.

Whatever the spec text currently says, both "line-cursor" and "block-cursor" editors exist, and it would be nice to specify something here to improve interop.

I see your point. The "between characters" model works for an insert-mode-like situation, but doesn't exist in an equivalent of "normal-mode", or where the cursor has an explicit location.

@Aerijo
Copy link

Aerijo commented Jun 23, 2020

So, how do you specify the start of the line ?

@puremourning Index 0 is the left of the first character

@puremourning
Copy link

But offset 0 means the first character in any 0-based offset system i've ever seen.

@rcjsuen
Copy link
Contributor

rcjsuen commented Jun 23, 2020

But offset 0 means the first character in any 0-based offset system i've ever seen.

@puremourning In this case the character does not matter because Position is referring to the space between two characters (or in the extreme case, the start of the line or the end of the line excluding newline characters).

Or perhaps I'm misinterpreting something here?

@puremourning
Copy link

puremourning commented Jun 23, 2020

The Position in the spec is defined as a 0-based offset of characters in the line:

0123456789
This is a line

The 0th character is T. If servers interpret offset of 0 as the space between character 0 and character 1 then offset 0 is marked here:

0 123456789
T his is a line
 ^

Whereas you're saying it should be interpreted as here:

 0123456789
 This is a line
^

Which i would largely agree is the only reasonable way this could work. But it makes the specification itself ambiguous.

Agree with @sam-mccall that editors where the cursor position is an integral actual-character offset don't have any ambiguity here. in the above cases, position 0 is clearly:

0123456789
This is a line
^

@Aerijo
Copy link

Aerijo commented Jun 23, 2020

Ah, that might be other experience leaking. I've only used line cursors, so think of it in terms of cursor position. The character and the cursor position to the left of it are interchangeable with this interpretation, giving a right bias on the cursor -> character transform. I agree, there doesn't seem to be any issue here then; Position's are specified as being character offsets, so it's up to the client to decide which character (as a Position) it wants to request for. It also makes sense for the client to decide this, to make the experience consistent across servers.

@sam-mccall
Copy link
Contributor Author

We're drifting a little bit from the original issue :-)
I believe the intention is that character is the number of characters to the left of the cursor. So the start of the line is 0 and the end of the line is len(line). AFAIK all editors (of both types) follow this, so while the spec text could be tighter, I don't think there's an actual interop problem here. I'll send a PR for the text.

@Aerijo
Copy link

Aerijo commented Jun 23, 2020

@sam-mccall I'm not sure what's off topic here: the spec says that a Position is the character offset, i.e., the character itself, and that the location for Goto Definition, etc., is given in terms of such a Position. Your original question about ambiguity is resolved with this understanding.

I think the question of how to resolve a given kind of cursor to a character is a concern for the LSP client, not the protocol or the server. IMO the protocol should be view / edit model agnostic as much as possible.

sam-mccall added a commit to sam-mccall/language-server-protocol that referenced this issue Jun 23, 2020
In microsoft#1029 it was pointed out that "between `character` and `character` + 1" might be off-by-one.

Given line `abc`, and `Position { character: 0 }`
- The offset of `a` is 0, and the offset of `b` is 1 (see previous section).
- "between `character` and `character + 1`" may be read as "between a and b"
- but this `Position` actually (in VSCode and other implementations) indicates start-of-line, i.e. before "a".

This patch *doesn't* change the model (that a position is always between two characters), an microsoft#1029 suggests.
However editors where the cursor lies on a character have to ignore that part, and this definition happens to specify the de-facto standard behavior for them too.
@puremourning
Copy link

@sam-mccall you're almost certainly right.

i do think this interpretation is material to this issue though.

The reason i say that is that if the specification states that a position lies between characters, even for ad-hoc requests like go-to, etc. then it's material to the discussion how that position should be interpreted. I think this issue is about helping servers 'guess' what the user's intention is in the model between-characters model, whereas actually if the model is applied consistently, then it should be possible for all clients and servers to interop without new capabilities or protocol changes. I guess i'm exploring if that's possible.

I realise that in the model where you specify the actual character position, there is no ambiguity. i'm thinking that it's the clients responsibility to remove the ambiguity from the user's perspective and the specification's responsibility to remove it from the client/server interpretation perspective.

@sam-mccall
Copy link
Contributor Author

@Aerijo

the spec says that a Position is the character offset

No, as previously quoted, the spec says: "A position is between two characters like an ‘insert’ cursor in a editor". This is in the section defining the Position structure specifically.

Your original question about ambiguity is resolved with this understanding

That's a perfectly self-consistent theory, but it doesn't match VSCode's selection behavior (see the OP). In practice this basically means the interpretation is wrong - the spec ~always follows VSCode's model, which is also perfectly self-consistent.

@puremourning

if the model is applied consistently, then it should be possible for all clients and servers to interop without new capabilities or protocol changes. I guess i'm exploring if that's possible.

Concretely this might look like "Position now represents a character" and VSCode has to sometimes subtract 1 compared to what it currently sends (assuming no dramatic changes to its internal model, which I think are unlikely). Apart from the question of whether VSCode is willing to make such a potentially-breaking behavior change, this necessarily involves the editor guessing about the significance of the characters on the left vs right of the cursor. This is heuristic, and I've had nothing but trouble where editors have needed to do this (can dig up examples if needed). So my preference is to avoid trying to patch this up.

@HighCommander4
Copy link

Perhaps I'm missing something, but can't this be solved in a general way by having the protocol take a range rather than a position? A between-characters cursor can be modelled as a zero-length range, and an on-a-character cursor as a length-one range.

Using ranges instead of positions also makes the protocol richer, e.g. see the existing discussion in #377 about hovers showing the type of a selected expression that's larger than just an identifier.

@Aerijo
Copy link

Aerijo commented Jun 24, 2020

OK, to be clear, I was basing my reasoning on the spec. I searched "position" in the specification, and the first result was as follows:

A position inside a document (see Position definition below) is expressed as a zero-based line and character offset.

followed by an example explicitly using "character offsets" to refer to characters

So a string of the form a𐐀b the character offset of the character a is 0, the character offset of 𐐀 is 1 and the character offset of b is 3 since 𐐀 is represented using two code units in UTF-16.

The leading sentence of Position confirms this

Position in a text document expressed as zero-based line and zero-based character offset

... and then the second sentence throws everything established so far out completely by saying they measure the space between characters

A position is between two characters like an ‘insert’ cursor in a editor.

...but wait, there's more: in the Range specification:

A range in a text document expressed as (zero-based) start and end positions. A range is comparable to a selection in an editor. Therefore the end position is exclusive.

it says that an end position (represented with a Position) is exclusive. Consider the range [[0, 0], [0, 1]]; do we agree that this is the first character of the first line? If so, then it is exclusive under the character model (doesn't touch the index 1 character), but not exclusive under the between-character model (does touch the index 1 space between characters).

So I hope you understand why I was quite confident about Positions being specified as character offsets, and not spaces between characters 🙂. Though I do feel better about my original comment now.

It seems then that it is the spec itself that is inconsistent then, probably because it is natural to think about Positions in terms of between characters (to represent an I cursor location / inclusive selection range), and also because they are used to represent characters for things like Goto Definition. I feel the spec is still overwhelmingly in favour of them being character offsets, so I feel my conclusion above is still relevant, but there is definitely inconsistency there.

However your point about needing a heuristic in the editor to decide which position to ask for is interesting; a simpler example might be

  f o o ) b a r
       ^ ^

If using an I beam cursor, the editor would need to pick which way to bias it to a character offset. The user would expect a particular one to be resolved, but there is no way to communicate which to the server without an editor heuristic (e.g., favour word characters over punctuation). Otherwise, if the editor always biases the I beam cursor in a particular direction, it may ask for the ) character (which is probably not useful), despite the user thinking it was part of the end of foo / start of bar.

@DavidGoldman
Copy link

Any update here? Seems like we could update the protocol to optionally include a range for selections?

@rcjsuen
Copy link
Contributor

rcjsuen commented Oct 8, 2020

b56a40f seems to imply it's up to the client to decide what to send back if there's a selection.

@HighCommander4
Copy link

HighCommander4 commented Oct 8, 2020

b56a40f seems to imply it's up to the client to decide what to send back if there's a selection.

Right, but if the protocol only specifies a point, then the information channel is lossy: the client can decide to send the starting point of the selection or the ending point, but not both.

For some use cases, it would be useful if the protocol allowed the client to send both endpoints of the selection.

@dbaeumer
Copy link
Member

dbaeumer commented Oct 9, 2020

b56a40f is to clarify the current situation. That doesn't mean that we can't extend the protocol with the selection if we think that is necessary.

adonovan added a commit to adonovan/language-server-protocol that referenced this issue Sep 22, 2024
This change adds an optional range field to textDocumentPositionParams
so that clients can, for example, query a subexpression such as
x.f within a larger expression g(x.f[i]).

Also, clarify the text about cursor bias.

Fixes microsoft#377
Fixes microsoft#1029
@adonovan adonovan linked a pull request Sep 22, 2024 that will close this issue
adonovan added a commit to adonovan/language-server-protocol that referenced this issue Sep 22, 2024
This change adds an optional range field to textDocumentPositionParams
so that clients can, for example, query a subexpression such as
x.f within a larger expression g(x.f[i]).

Also, clarify the text about cursor bias.

Fixes microsoft#377
Fixes microsoft#1029
@adonovan
Copy link

Could one of the maintainers please review this PR: #2027? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request Request for new features or functionality goto
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants