Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3870: Async media upload extension: upload to URL #3870

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 89 additions & 0 deletions proposals/3870-media-async-upload-url.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# MSC3870: Async media upload extension: upload to URL

This MSC proposes an extension to [MSC2246](https://github.com/matrix-org/matrix-spec-proposals/pull/2246)
(async media uploads) that allows a media server to give clients a URL to use for uploading media
data after the MXID is created.

The rationale behind this is to enable clients to upload direct to the backing datastore, for example
by using [S3 presigned URLs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/PresignedUrlUploadObject.html)
and/or [S3 transfer acceleration](https://docs.aws.amazon.com/AmazonS3/latest/userguide/transfer-acceleration-getting-started.html)
(this is not limited to S3, just well documented by AWS). By enabling clients to upload to a different
URL server owners could improve upload performance significantly by using a CDN with closer Points
of Presence to their users than the homeserver itself.

When combined with [MSC3860](https://github.com/matrix-org/matrix-spec-proposals/pull/3860) (media
download redirect URLs) this makes it possible to run a "lean" media server that orchestrates where
media is uploaded to and downloaded from but does not directly handle much media itself. Note this
doesn't include thumbnails or URL previews which would still go through the media server.


## Proposal

The proposal modifies the MSC2246 create endpoint to optionally include a URL that a compatible
client may use to upload the content for the created MXID. This is considered optional and the
client may continue to upload via the new upload endpoint defined in MSC2246.
Comment on lines +22 to +24
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clients should be required to use the URL when the server generates one, I think. The server is almost certainly trying to shed bandwidth load when it generates one of these URLs, and it's already gone through the effort to allocate a storage location for the client. That's a lot of wasted effort/cash if the client ignores the field.

For MSC purposes, making it required I think means:

  • New endpoint version for /create, to encourage clients to opt-in to the field's presence
  • Deprecating the old endpoint version, further pushing clients towards the new one
  • On PUT /upload, an error code to denote that uploads aren't welcome for that URI. This does not require a new endpoint version because the client already opted-in to using the new upload URL-supported endpoint earlier in the process.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't clients still be able to avoid using the CDN by using the old synchronous upload endpoint?

If we're going through the effort to insist that clients use the upload_url, should we also start making plans tokill the ability to do synchrous uploads direct to the media server at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should leave this optional for now, and defer deprecation or blocking of direct upload to a later MSC. Allowing both in theory makes switching over to the upload URLs easier for client dev, and then we avoid all introducing additional complexity as part of this MSC.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't clients still be able to avoid using the CDN by using the old synchronous upload endpoint?

If we're going through the effort to insist that clients use the upload_url, should we also start making plans tokill the ability to do synchrous uploads direct to the media server at all?

The proposed plan deprecates the endpoint, making it eligible for removal in a later version of the spec.

I think we should leave this optional for now, and defer deprecation or blocking of direct upload to a later MSC. Allowing both in theory makes switching over to the upload URLs easier for client dev, and then we avoid all introducing additional complexity as part of this MSC.

Clients can still use deprecated endpoints, but without a reason to change there's not much motivation for them to implement this. I do feel quite strongly that if the server provides a URL, the client must use that URL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed up fixes for the other things. I think deferring the new endpoint version and requiring clients use the URL provided is the right call - it'd be a quick follow up MSC, or might tie in with replacement plans for the old upload endpoints. Either way I don't see any real benefit to bundling it into this MSC.


`POST /_matrix/media/v1/create`

**Example response**

```json
{
"content_uri": "mxc://example.com/AQwafuaFswefuhsfAFAgsw",
"unused_expires_at": 1647257217083,
"upload_method": "PUT",
"upload_url": "https://cdn.example.com/media-repo/upload/XAPw4CtrzArk?signed=h4tGOHvCu"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What HTTP method does the client use to upload the file? POST? PUT? Or do you need another parameter to specify the method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified in ba15fa9 (PUT, for now).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably encode it into the field here, making it generic against providers.

Suggested change
"upload_url": "https://cdn.example.com/media-repo/upload/XAPw4CtrzArk?signed=h4tGOHvCu"
"upload_location": "PUT https://cdn.example.com/media-repo/upload/XAPw4CtrzArk?signed=h4tGOHvCu"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use a separate field here just to be explicit, upload_method, will update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the URL expires at the same time as unused_expires_at - just need to document that in the MSC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
```

If the client chooses to upload media via the `upload_url` field it must use the method specified
by the `upload_method` field. Clients should expect the upload URL to work until `unused_expires_at`
time, afterwhich a new media would need to be created. Upload request based on the above response:

`PUT https://cdn.example.com/media-repo/upload/XAPw4CtrzArk?signed=h4tGOHvCu`

The response to the upload is implementation specific depending on the target server, the client
should interpret standard HTTP responses, a 200 or 201 indicating successful upload. Once the upload
is complete, the client must notify the media server via a new endpoint:

`POST /_matrix/media/v1/upload/{serverName}/{mediaId}/complete`

**Example response**

```json
{
"content_uri": "mxc://example.com/AQwafuaFswefuhsfAFAgsw"
}
```

This allows the media server to perform any post-upload actions before either allowing the media
to be downloaded (and returning the URI to the client as above) or rejecting the upload.

This means that there is potential for a client to send an event referencing an MXC that is then
uploaded and rejected. Note that this is also a problem MSC2246 faces, whatever solution applies
there can be applied here.


## Alternatives

### Run the whole homeserver behind CDN

Server owners could run their own Points of Presence to receive and handle media content and pass
through any non-media requests to a central homeserver (or run a distributed homeserver if/when
that exists). There is a significant cost (management time & infrastructure) associated with this.


## Security Considerations

Security of presigned URL uploads is well documented but important to be aware of when implementing
support for this.


## Unstable Prefix

While this MSC is not in a released version of the spec, implementations should use `com.beeper.msc3880`
as a prefix and as an unstable_features flag in the /versions endpoint.

```
POST /_matrix/media/unstable/com.beeper.msc3870/upload/{serverName}/{mediaId}/complete
```