Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2385: Disable URL Previews, alternative method #2385

Open
wants to merge 2 commits into
base: old_master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions proposals/2385-no-url-previews.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# MSC 2385: Disable URL Previews
This MSC is an alternative suggestion to [MSC2376](https://github.com/matrix-org/matrix-doc/pull/2376).

Currently URLs posted in chat give a URL preview. Sometimes, for example with bots, this can be
unwanted. For instance if a bot is posting a message with a lot of links, or the content of the link
is already obvious from the message itself.

## Proposal
This proposal adds a new, optional key to `m.room.message` messages, named `url_previews`. It is an
array of strings containing the URLs a preview should be given to. If this key is absent, it is expected
that the client gives URL previews for all URLs present in the message body. If a URL is present in this key that is *not*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't expect that clients give URL previews as this could be a security problem. We should say that clients MAY provide previews for all URLs in the body.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Also FYI url previews are proxied through your homeserver, so you don't actually have any security problem when generating those.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, there is a problem: WWW URLs may return different things depending on other log-ins and the like, so the sender and the homeserver of the sender may not see the same content.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://matrix.org/docs/spec/client_server/r0.6.1#get-matrix-media-r0-preview-url discussion on that is beyond this MSC. The top comment suggests to use "may" instead of "it is expected", which soru will incorporate into this MSC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it reveals the URL to the home server. You can imagine a case where a malicious user sends an illegal url so that a user may get in trouble if the honeservers logs are checked. I agree it isn't the worst privacy flaw but I think it is worth letting clients make the choice and provide users with options rather than writing this section like it is expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The top comment suggests to use "may" instead of "it is expected", which soru will incorporate into this MSC

use "may" instead of "it is expected"

may means the clients still have a choice.

this MSC does not add any new locations where clients should display url previews, it is about limiting it in some messages. "Disable URL Previews".

Also there is a reason element disables URL previews in e2ee rooms by default.

Copy link
Contributor

@kevincox kevincox Mar 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that. I think it makes sense to focus on disallowing certain previews in this MSC and not sound like we are requiring any previews where they weren't required previously.

present in the body clients SHOULD NOT render a preview for that URL.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the body or formatted body?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(answering this may lead to further questions and concerns)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since all links in the formatted_body should be present in the body and vice-versa (as the body is a plain-text fallback) this should only matter for those messages that actively disobey that.

That being said, it seem to make most sense to use the links in whichever the client uses to render: if it renders formatted stuffs the formatted_body, if it renders plaintext the body

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, so the concern here would be that you can make a very convincing phishing attack: if it uses URLs only from body, your formatted body can be <a href="https://scam.com">https://example.org</a> with url_previews for https://example.org. Likewise, if it only looks at formatted bodies then the same attack can be applied.

Currently at least, clients will (or should) be showing a URL preview for the href rather than the text.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a client uses body (including URLs in it, naturally) to render the message then formatted_body is irrelevant, whichever URLs it contains, and vice versa. url_previews is only used as a whitelist for URLs in whichever body the client renders. I don't quite see the surface for a phishing attack here: what you see is what you get, with a caveat that an unsuppressed URL preview for a phishy URL in formatted_body may help revealing the attack (at the very least, it won't conceal the attack), whereas if an attacker suppreses the URL preview then it's up to the user to check where that URL leads to before following it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it's the responsibility of the user to check links before they click but in practice we shouldn't be designing into the protocol a requirement for people to do that.

The attack is quite literally:

{
  "body": "https://example.org",
  "format": "org.matrix.custom.html",
  "formatted_body": "<a href="https://phishing.example.org">https://example.org</a>",
  "url_previews": ["https://example.org"]
}

Which clients that don't read the spec to the letter would render as:

https://example.org

Example Domain
This is an example domain.

Instead, we could be better at protecting users from badly written clients and from having to be more responsible than us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix for that is equally as trivial: As the client uses formatted_body to figure out URL previews, it is expected to figure out that the stuff in the href of the <a> tag should give the preview - not its content

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be more clear if it was specified like the following:

A client may chose to provide URL previews for the user. For each link the client finds in the message the client must consult the url_previews attribute. If url_previews is absent all URLs are allowed to be previewed, otherwise if the attribute is a list contacting the link URL, the URL is present for previewing.

So it is clear that the client must not read url_previews and search for the URL in the body, but instead if in the process of rendering the body it finds a link it should consult url_previews. In your example the client would never consider https://example.org a link and therefore would never be eligible for a preview whether or not url_previews was absent or ["https://example.org"]

The consequence is that putting arbitrary URLs into url_previews accomplishes nothing unless the URL is actually going to be treated as a link by the client.


### Examples
No key added, preview to all URLs:
```json
{
"msgtype": "m.text",
"body": "Wow, check out this song! https://www.youtube.com/watch?v=oHg5SJYRHA0"
}
```

Empty array for `url_previews`, thus no URL preview at all:
```json
{
"msgtype": "m.text",
"format": "or.matrix.custom.html",
"body": "Hey, look at what I did! [MSC2385: Disable URL Previews](https://github.com/matrix-org/matrix-doc/pull/2385)",
"formatted_body": "Hey, look at what I did! <a src=\"https://github.com/matrix-org/matrix-doc/pull/2385\">MSC2385: Disable URL Previews</a>",
"url_previews": []
}
```

Array of URLs to preview, only those in the array are previewed:
```json
{
"msgtype": "m.text",
"body": "https://example.com https://matrix.org",
"url_previews": ["https://example.com"]
uhoreg marked this conversation as resolved.
Show resolved Hide resolved
}
```

## Potential issues
URLs that should have a preview need to be sent twice - once in the body, once in `url_previews`. As
this feature is meant for bots, normal human clients would probably omit `url_previews` alltogether,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this feature is meant for bots, normal human clients would probably omit `url_previews` alltogether,
this feature is meant for bots, normal human clients would probably omit `url_previews` altogether,

... because English is weird.

resulting in all URLs sent from them having a preview.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no reason a client can't display a list of links that it detects and allow the user to disable previews for some or all of them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must have missed this paragraph earlier. I agree that I don't see any reason why this should be intended mainly for bots and not for humans. There have been several occasions where I've typed a link and not wanted to have a preview because it causes extra noise (not just for rickrolling).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I, as a human, would have use for this every once in a while. Implementation examples of UI for this have been mentioned earlier (facebook, ...). I don't know if it should be added to the MSC, but since the point is to not send certain URLs to the HS, the sending UX should be constructed in such a way that, if at all, it is the client which generates the preview so that you can then decide to include it or not. The reasoning being that the sending user has (99% likely) already visited that URL and, if privacy is a concern, taken appropriate measures.


URLs in `url_previews` and in the body need to match exactly, which could be confusing with trailing
slashes (`/`).

## Alternatives
There could be an attribute added to the `<a>` tag which disables URL previews. See
[MSC2376](https://github.com/matrix-org/matrix-doc/pull/2376).

Instead of a whitelist there could also be a blacklist of URLs. However, as the likely scenario is a
bot wanting to surpress all the previews for its URLs sent it seems like a whitelist is more appropriate
here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A whitelist also nicely prevents clients from parsing things that were not intended to be URLs and trying to preview them. Should mostly be relevant for text as html has a precise notion of what a link is but still nice to have.


## Security considerations
You could get tricked more easily into being Rickroll'd. This can be alleviated by encouraging
clients to request previews explicitly (either by hovering or by clicking a special button next to
the link).