Provide range or bias direction for go-to-definition etc #1029

sam-mccall · 2020-06-22T19:55:48Z

Go-to-definition (and others) is triggered on a cursor position. In VSCode, the cursor lies between characters, so ambiguity: the thing on the left or the thing on the right?

A couple of factors make this worse:

having a selection resolves the ambiguity, but the language server doesn't get to see it. In VSCode a selection usually requires biasing left.
other editors place the cursor on characters and so have no ambiguity. They always need the language server to bias right.

To illustrate, these screenshots produce the same textDocument/definition request, but want different results.

In vim. This should go to Foo::operator->.

In VSCode. This should go to object.

It'd be nice to have some more information so language servers can make better decisions. Some ideas:

requests provide optional selection range in addition to cursor position. (block-cursor editors might always provide at least a 1-char range)
a client capability describing the cursor type. (Language servers could implement heuristics like bias-towards-identifier for cursor-between-char editors, and have the correct behavior for block-cursor editors)

I guess this is mostly an issue with languages that have overloaded operators. constructor and functor calls are common examples in C++.

The text was updated successfully, but these errors were encountered:

sam-mccall · 2020-06-22T19:59:11Z

(Just noticed that the highlighting in the second screenshot shows the problem nicely.
VSCode's built-in identifier highlighting marks object on lines 2 and 3, while clangd's textDocument/highlight marks operator on line 2 and -> on line 3. I think not including the -> on line 2 is a clangd bug...)

rcjsuen · 2020-06-22T20:00:54Z

Also see microsoft/vscode-languageserver-node#593.

dbaeumer · 2020-06-23T08:58:51Z

As mentioned in microsoft/vscode-languageserver-node#593 we need to change the protocol if we need to let servers know about this. Adding a selection to the params might not help since VS Code doesn't have the information right now in the API and other editors might behave the same. So having a capability looks like a good approach to have more control over this.

puremourning · 2020-06-23T09:05:31Z

In VSCode, the cursor lies between characters, so ambiguity: the thing on the left or the thing on the right?

That's a VSCode issue, not a protocol issue, though right ? The protocol is explicit about which "character" is being requested (it's a row/column position, not a position bettween two columns). Why would we need to change the protocol for this? Surely this change can be made in VSCode only.

Or am i missing something important?

sam-mccall · 2020-06-23T14:22:44Z

Adding a selection to the params might not help since VS Code doesn't have the information right now in the API and other editors might behave the same

Makes sense to me. A capability seems less complex overall, with most of the complexity lying in the server rather than client or protocol.

The protocol is explicit about which "character" is being requested (it's a row/column position, not a position bettween two columns).

I think it's the opposite:

A position is between two characters like an ‘insert’ cursor in a editor

Whatever the spec text currently says, both "line-cursor" and "block-cursor" editors exist, and it would be nice to specify something here to improve interop.

puremourning · 2020-06-23T14:44:00Z

I think it's the opposite:

Quite right, ugh.

value represents the gap between the character and character + 1.

Oh boy, i had not noticed that. So, how do you specify the start of the line ? 0 would mean the gap between 0 (the first character) and 1 (the next character), where it's really the gap between the "theoretical" start-of-line and the 0th (first) character.

On reflection, this model seems kinda wonky. For completion requests the character position is usually requested as the position immediately following the last char on the line, but this would indicate that the server should interpret is therefore as to between the "one-past-the-last-char-in-line" and the "two-past-the-last-char-in-line". Fortunately, either we (and most other clients) are not actually not sending the 'one-past-the-last-char-in-line" position or servers are already interpreting the completion position as if it were an actual position (not a virtual thing between 2 positions). I suspect the later.

Whatever the spec text currently says, both "line-cursor" and "block-cursor" editors exist, and it would be nice to specify something here to improve interop.

I see your point. The "between characters" model works for an insert-mode-like situation, but doesn't exist in an equivalent of "normal-mode", or where the cursor has an explicit location.

Aerijo · 2020-06-23T14:46:06Z

So, how do you specify the start of the line ?

@puremourning Index 0 is the left of the first character

puremourning · 2020-06-23T14:52:15Z

But offset 0 means the first character in any 0-based offset system i've ever seen.

rcjsuen · 2020-06-23T14:56:17Z

But offset 0 means the first character in any 0-based offset system i've ever seen.

@puremourning In this case the character does not matter because Position is referring to the space between two characters (or in the extreme case, the start of the line or the end of the line excluding newline characters).

Or perhaps I'm misinterpreting something here?

puremourning · 2020-06-23T15:01:50Z

The Position in the spec is defined as a 0-based offset of characters in the line:

0123456789
This is a line

The 0th character is T. If servers interpret offset of 0 as the space between character 0 and character 1 then offset 0 is marked here:

0 123456789
T his is a line
 ^

Whereas you're saying it should be interpreted as here:

 0123456789
 This is a line
^

Which i would largely agree is the only reasonable way this could work. But it makes the specification itself ambiguous.

Agree with @sam-mccall that editors where the cursor position is an integral actual-character offset don't have any ambiguity here. in the above cases, position 0 is clearly:

0123456789
This is a line
^

Aerijo · 2020-06-23T15:13:54Z

Ah, that might be other experience leaking. I've only used line cursors, so think of it in terms of cursor position. The character and the cursor position to the left of it are interchangeable with this interpretation, giving a right bias on the cursor -> character transform. I agree, there doesn't seem to be any issue here then; Position's are specified as being character offsets, so it's up to the client to decide which character (as a Position) it wants to request for. It also makes sense for the client to decide this, to make the experience consistent across servers.

sam-mccall · 2020-06-23T15:31:07Z

We're drifting a little bit from the original issue :-)
I believe the intention is that character is the number of characters to the left of the cursor. So the start of the line is 0 and the end of the line is len(line). AFAIK all editors (of both types) follow this, so while the spec text could be tighter, I don't think there's an actual interop problem here. I'll send a PR for the text.

Aerijo · 2020-06-23T15:37:42Z

@sam-mccall I'm not sure what's off topic here: the spec says that a Position is the character offset, i.e., the character itself, and that the location for Goto Definition, etc., is given in terms of such a Position. Your original question about ambiguity is resolved with this understanding.

I think the question of how to resolve a given kind of cursor to a character is a concern for the LSP client, not the protocol or the server. IMO the protocol should be view / edit model agnostic as much as possible.

In microsoft#1029 it was pointed out that "between `character` and `character` + 1" might be off-by-one. Given line `abc`, and `Position { character: 0 }` - The offset of `a` is 0, and the offset of `b` is 1 (see previous section). - "between `character` and `character + 1`" may be read as "between a and b" - but this `Position` actually (in VSCode and other implementations) indicates start-of-line, i.e. before "a". This patch *doesn't* change the model (that a position is always between two characters), an microsoft#1029 suggests. However editors where the cursor lies on a character have to ignore that part, and this definition happens to specify the de-facto standard behavior for them too.

puremourning · 2020-06-23T15:46:57Z

@sam-mccall you're almost certainly right.

i do think this interpretation is material to this issue though.

The reason i say that is that if the specification states that a position lies between characters, even for ad-hoc requests like go-to, etc. then it's material to the discussion how that position should be interpreted. I think this issue is about helping servers 'guess' what the user's intention is in the model between-characters model, whereas actually if the model is applied consistently, then it should be possible for all clients and servers to interop without new capabilities or protocol changes. I guess i'm exploring if that's possible.

I realise that in the model where you specify the actual character position, there is no ambiguity. i'm thinking that it's the clients responsibility to remove the ambiguity from the user's perspective and the specification's responsibility to remove it from the client/server interpretation perspective.

sam-mccall · 2020-06-23T16:26:34Z

@Aerijo

the spec says that a Position is the character offset

No, as previously quoted, the spec says: "A position is between two characters like an ‘insert’ cursor in a editor". This is in the section defining the Position structure specifically.

Your original question about ambiguity is resolved with this understanding

That's a perfectly self-consistent theory, but it doesn't match VSCode's selection behavior (see the OP). In practice this basically means the interpretation is wrong - the spec ~always follows VSCode's model, which is also perfectly self-consistent.

@puremourning

if the model is applied consistently, then it should be possible for all clients and servers to interop without new capabilities or protocol changes. I guess i'm exploring if that's possible.

Concretely this might look like "Position now represents a character" and VSCode has to sometimes subtract 1 compared to what it currently sends (assuming no dramatic changes to its internal model, which I think are unlikely). Apart from the question of whether VSCode is willing to make such a potentially-breaking behavior change, this necessarily involves the editor guessing about the significance of the characters on the left vs right of the cursor. This is heuristic, and I've had nothing but trouble where editors have needed to do this (can dig up examples if needed). So my preference is to avoid trying to patch this up.

HighCommander4 · 2020-06-24T00:26:10Z

Perhaps I'm missing something, but can't this be solved in a general way by having the protocol take a range rather than a position? A between-characters cursor can be modelled as a zero-length range, and an on-a-character cursor as a length-one range.

Using ranges instead of positions also makes the protocol richer, e.g. see the existing discussion in #377 about hovers showing the type of a selected expression that's larger than just an identifier.

Aerijo · 2020-06-24T02:09:26Z

OK, to be clear, I was basing my reasoning on the spec. I searched "position" in the specification, and the first result was as follows:

A position inside a document (see Position definition below) is expressed as a zero-based line and character offset.

followed by an example explicitly using "character offsets" to refer to characters

So a string of the form a𐐀b the character offset of the character a is 0, the character offset of 𐐀 is 1 and the character offset of b is 3 since 𐐀 is represented using two code units in UTF-16.

The leading sentence of Position confirms this

Position in a text document expressed as zero-based line and zero-based character offset

... and then the second sentence throws everything established so far out completely by saying they measure the space between characters

A position is between two characters like an ‘insert’ cursor in a editor.

...but wait, there's more: in the Range specification:

A range in a text document expressed as (zero-based) start and end positions. A range is comparable to a selection in an editor. Therefore the end position is exclusive.

it says that an end position (represented with a Position) is exclusive. Consider the range [[0, 0], [0, 1]]; do we agree that this is the first character of the first line? If so, then it is exclusive under the character model (doesn't touch the index 1 character), but not exclusive under the between-character model (does touch the index 1 space between characters).

So I hope you understand why I was quite confident about Positions being specified as character offsets, and not spaces between characters 🙂. Though I do feel better about my original comment now.

It seems then that it is the spec itself that is inconsistent then, probably because it is natural to think about Positions in terms of between characters (to represent an I cursor location / inclusive selection range), and also because they are used to represent characters for things like Goto Definition. I feel the spec is still overwhelmingly in favour of them being character offsets, so I feel my conclusion above is still relevant, but there is definitely inconsistency there.

However your point about needing a heuristic in the editor to decide which position to ask for is interesting; a simpler example might be

  f o o ) b a r
       ^ ^

If using an I beam cursor, the editor would need to pick which way to bias it to a character offset. The user would expect a particular one to be resolved, but there is no way to communicate which to the server without an editor heuristic (e.g., favour word characters over punctuation). Otherwise, if the editor always biases the I beam cursor in a particular direction, it may ask for the ) character (which is probably not useful), despite the user thinking it was part of the end of foo / start of bar.

DavidGoldman · 2020-10-08T21:12:14Z

Any update here? Seems like we could update the protocol to optionally include a range for selections?

rcjsuen · 2020-10-08T22:33:07Z

b56a40f seems to imply it's up to the client to decide what to send back if there's a selection.

HighCommander4 · 2020-10-08T22:38:16Z

b56a40f seems to imply it's up to the client to decide what to send back if there's a selection.

Right, but if the protocol only specifies a point, then the information channel is lossy: the client can decide to send the starting point of the selection or the ending point, but not both.

For some use cases, it would be useful if the protocol allowed the client to send both endpoints of the selection.

dbaeumer · 2020-10-09T08:25:10Z

b56a40f is to clarify the current situation. That doesn't mean that we can't extend the protocol with the selection if we think that is necessary.

This change adds an optional range field to textDocumentPositionParams so that clients can, for example, query a subexpression such as x.f within a larger expression g(x.f[i]). Also, clarify the text about cursor bias. Fixes microsoft#377 Fixes microsoft#1029

adonovan · 2024-09-25T15:27:39Z

Could one of the maintainers please review this PR: #2027? Thanks.

dbaeumer added the feature-request Request for new features or functionality label Jun 23, 2020

dbaeumer added this to the On Deck milestone Jun 23, 2020

dbaeumer added the goto label Nov 11, 2020

dbaeumer modified the milestones: On Deck, Backlog Nov 11, 2020

dbaeumer mentioned this issue Nov 11, 2020

Goto Type Definition should probably use range and not just position #1007

Closed

sam-mccall mentioned this issue Nov 22, 2021

documentHighlight chooses wrong symbol clangd/clangd#936

Open

HighCommander4 mentioned this issue Mar 16, 2022

GoTo Definition for marked template type not working as expected clangd/vscode-clangd#307

Open

HighCommander4 mentioned this issue Aug 29, 2024

Pass the current selection for a Hover Request #377

Open

adonovan linked a pull request Sep 22, 2024 that will close this issue

textDocumentPositionParams: add optional range #2027

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide range or bias direction for go-to-definition etc #1029

Provide range or bias direction for go-to-definition etc #1029

sam-mccall commented Jun 22, 2020

sam-mccall commented Jun 22, 2020

rcjsuen commented Jun 22, 2020

dbaeumer commented Jun 23, 2020

puremourning commented Jun 23, 2020

sam-mccall commented Jun 23, 2020

puremourning commented Jun 23, 2020

Aerijo commented Jun 23, 2020

puremourning commented Jun 23, 2020

rcjsuen commented Jun 23, 2020

puremourning commented Jun 23, 2020 •

edited

Loading

Aerijo commented Jun 23, 2020

sam-mccall commented Jun 23, 2020

Aerijo commented Jun 23, 2020 •

edited

Loading

puremourning commented Jun 23, 2020

sam-mccall commented Jun 23, 2020

HighCommander4 commented Jun 24, 2020

Aerijo commented Jun 24, 2020 •

edited

Loading

DavidGoldman commented Oct 8, 2020

rcjsuen commented Oct 8, 2020

HighCommander4 commented Oct 8, 2020 •

edited

Loading

dbaeumer commented Oct 9, 2020

adonovan commented Sep 25, 2024

Provide range or bias direction for go-to-definition etc #1029

Provide range or bias direction for go-to-definition etc #1029

Comments

sam-mccall commented Jun 22, 2020

sam-mccall commented Jun 22, 2020

rcjsuen commented Jun 22, 2020

dbaeumer commented Jun 23, 2020

puremourning commented Jun 23, 2020

sam-mccall commented Jun 23, 2020

puremourning commented Jun 23, 2020

Aerijo commented Jun 23, 2020

puremourning commented Jun 23, 2020

rcjsuen commented Jun 23, 2020

puremourning commented Jun 23, 2020 • edited Loading

Aerijo commented Jun 23, 2020

sam-mccall commented Jun 23, 2020

Aerijo commented Jun 23, 2020 • edited Loading

puremourning commented Jun 23, 2020

sam-mccall commented Jun 23, 2020

HighCommander4 commented Jun 24, 2020

Aerijo commented Jun 24, 2020 • edited Loading

DavidGoldman commented Oct 8, 2020

rcjsuen commented Oct 8, 2020

HighCommander4 commented Oct 8, 2020 • edited Loading

dbaeumer commented Oct 9, 2020

adonovan commented Sep 25, 2024

puremourning commented Jun 23, 2020 •

edited

Loading

Aerijo commented Jun 23, 2020 •

edited

Loading

Aerijo commented Jun 24, 2020 •

edited

Loading

HighCommander4 commented Oct 8, 2020 •

edited

Loading