Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support I/O stream for protobuf format #2075

Open
GeorgePap-719 opened this issue Oct 24, 2022 · 7 comments
Open

Support I/O stream for protobuf format #2075

GeorgePap-719 opened this issue Oct 24, 2022 · 7 comments
Assignees
Labels

Comments

@GeorgePap-719
Copy link
Contributor

GeorgePap-719 commented Oct 24, 2022

Summary

Currently, there is no API which reads or writes directly from
stream in protobuf format. Such API makes sense for frameworks, which helps them to avoid decoding streams to an array first or encoding result to an array and then passing it to the stream.

Frameworks

Spring actually needs such an API to complete integration with kotlinx.serialization, see: Spring-KotlinProtobufEncoder.

Besides Spring, it is to my understanding that also other frameworks can benefit from this addition, for example ktor.

Proposal

Message

A prototype shape of API can be similar to ones json format has:

ProtoBuf.encodeToStream

ProtoBuf.decodeFromStream

Implementation

A draft implementation has already been implemented and reviewed. See #2082.

Delimited messages

Since we are taking about streams and protobuf messages, we cannot ignore delimited messages. It is a core technique where we can write multiple messages in the same stream.

It is important to note that, delimited messages are the main reason and goal of this proposal. This is because, frameworks will most likely use/need this technique to write multiple messages in the same steam.

Implementation

SerializedSize:

To support delimited messages there is one thing to address. How to compute the required size for encoding a message without reading all bytes to an array first.

To address this problem we need an API which will compute the required bytes to encode a message. A proof of concept implementation is in #2239.

An early shape of the API is:

ProtoBuf.getOrComputeSerializedSize(serializer: SerializationStrategy<T>, value: T): Int

Encode delimited messages:

A very early implementation is in #2082, but is not using the solution from above yet.

A draft shape of the APIs are:

ProtoBuf.encodeDelimitedMessagesToStream

ProtoBuf.decodeDelimitedMessagesFromStream

GeorgePap-719 added a commit to GeorgePap-719/kotlinx.serialization that referenced this issue Oct 29, 2022
Add implementations for streaming support in Protobuf format supporting
simple and delimited messages.

Signed-off-by: George Papadopoulos <[email protected]>
GeorgePap-719 added a commit to GeorgePap-719/kotlinx.serialization that referenced this issue Oct 29, 2022
Add implementations for streaming support in Protobuf format supporting
simple and delimited messages.

Signed-off-by: George Papadopoulos <[email protected]>
GeorgePap-719 added a commit to GeorgePap-719/kotlinx.serialization that referenced this issue Dec 1, 2022
Add implementations for streaming support in Protobuf format supporting
simple and delimited messages.

Signed-off-by: George Papadopoulos <[email protected]>
GeorgePap-719 added a commit to GeorgePap-719/kotlinx.serialization that referenced this issue Jan 9, 2023
Add implementations for streaming support in Protobuf format supporting
simple and delimited messages.

Signed-off-by: George Papadopoulos <[email protected]>
GeorgePap-719 added a commit to GeorgePap-719/kotlinx.serialization that referenced this issue Jan 19, 2023
Add implementations for streaming support in Protobuf format supporting
simple and delimited messages.

Signed-off-by: George Papadopoulos <[email protected]>
GeorgePap-719 added a commit to GeorgePap-719/kotlinx.serialization that referenced this issue Mar 17, 2023
… to encode a message

This commit introduces a new API which computes the required bytes to
encode a message without actually serializing a message. By extension,
this API is meant to be used as a cornerstone for implementing Kotlin#2075.

Signed-off-by: George Papadopoulos <[email protected]>
GeorgePap-719 added a commit to GeorgePap-719/kotlinx.serialization that referenced this issue Mar 17, 2023
… to encode a message

This commit introduces a new API which computes the required bytes to
encode a message without actually serializing a message. By extension,
this API is meant to be used as a cornerstone for implementing Kotlin#2075.

Signed-off-by: George Papadopoulos <[email protected]>
GeorgePap-719 added a commit to GeorgePap-719/kotlinx.serialization that referenced this issue Mar 17, 2023
… to encode a message

This commit introduces a new API which computes the required bytes to
encode a message without actually serializing a message. By extension,
this API is meant to be used as a cornerstone for implementing Kotlin#2075.

Signed-off-by: George Papadopoulos <[email protected]>
GeorgePap-719 added a commit to GeorgePap-719/kotlinx.serialization that referenced this issue Mar 17, 2023
… to encode a message

This commit introduces a new API which computes the required bytes to
encode a message without actually serializing a message. By extension,
this API is meant to be used as a cornerstone for implementing Kotlin#2075.

Signed-off-by: George Papadopoulos <[email protected]>
GeorgePap-719 added a commit to GeorgePap-719/kotlinx.serialization that referenced this issue Mar 17, 2023
… to encode a message

This commit introduces a new API which computes the required bytes to
encode a message without actually serializing a message. By extension,
this API is meant to be used as a cornerstone for implementing Kotlin#2075.

Signed-off-by: George Papadopoulos <[email protected]>
GeorgePap-719 added a commit to GeorgePap-719/kotlinx.serialization that referenced this issue Apr 4, 2023
This commit introduces a new API which computes the required bytes to
encode a message without actually serializing a message. By extension,
this API is meant to be used as a cornerstone for implementing Kotlin#2075.

Signed-off-by: George Papadopoulos <[email protected]>
GeorgePap-719 added a commit to GeorgePap-719/kotlinx.serialization that referenced this issue Apr 4, 2023
This commit introduces a new API which computes the required bytes to
encode a message without actually serializing a message. By extension,
this API is meant to be used as a cornerstone for implementing Kotlin#2075.

Signed-off-by: George Papadopoulos <[email protected]>
GeorgePap-719 added a commit to GeorgePap-719/kotlinx.serialization that referenced this issue Apr 4, 2023
…ge (Kotlin#2239)

This commit introduces a new API which computes the required bytes to
encode a message without actually serializing a message. By extension,
this API is meant to be used as a cornerstone for implementing Kotlin#2075.

Signed-off-by: George Papadopoulos <[email protected]>
GeorgePap-719 added a commit to GeorgePap-719/kotlinx.serialization that referenced this issue Apr 7, 2023
…ge (Kotlin#2239)

This commit introduces a new API which computes the required bytes to
encode a message without actually serializing a message. By extension,
this API is meant to be used as a cornerstone for implementing Kotlin#2075.

Signed-off-by: George Papadopoulos <[email protected]>
@bishiboosh
Copy link

I would be quite interested by this to be able to use it Android Datastore. Can we take inspiration from what's been done in Wire maybe ?

@GeorgePap-719
Copy link
Contributor Author

Ofc, feel free to take anything you like.

@ShreckYe
Copy link
Contributor

ShreckYe commented Mar 6, 2024

A vote for this.

One question: why don't you open a pull request or draft pull request for this? it can make reviewing changes easier.

And an immature suggestion: can we get it all multiplatform with Coroutines channels or kotlinx-io? With channels we get suspending/non-blocking streaming support too. kotlinx-io provides the Sink and Source APIs though I am not sure how stable it is.

@GeorgePap-719
Copy link
Contributor Author

Hey @ShreckYe, what is your use-case for this feature? Can you explain it here, please.

can we get it all multiplatform with Coroutines channels or kotlinx-io? With channels we get suspending/non-blocking streaming support too. kotlinx-io provides the Sink and Source APIs though I am not sure how stable it is.

Usually these kind of low-level APIs are blocking for performance reasons. However, I can provide APIs which return a Sequence, similar to what Json format provides.

@ShreckYe
Copy link
Contributor

ShreckYe commented Mar 6, 2024

Hi @GeorgePap-719, thanks for asking. I mainly care about multiplatform support now. Suspending support can be an extra bonus but I don't need it urgently.

Usually these kind of low-level APIs are blocking for performance reasons.

In my opinion this is half valid. If you are only encoding and decoding small messages this can be true, which does happen in most use cases, because the bytes are probably written and read into buffers and one encoding or decoding operation might encode or decode the message as a whole to or from the buffer, not reaching the end involving any IO waiting for the buffer to be reset. However for large messages, these operations will involve IO and waiting. And under asynchronous/non-blocking/reactive architecture, such as Vert.x in my case, reactive/suspending IO does improve throughput.

It's true that a major obstacle here is that if these decoding/encoding functions are made suspend, the supporting atomic operations and thus the whole Encoder/Decoder hierarchy will also be made suspend, which requires a lot of code to be refactored and I suppose you are already aware of. Kotlin Coroutines is based on CPS transformation, so there is also some overhead. So does improvement brought by reactiveness/suspending justify the sacrifice in overhead in terms of throughput? I don't know. Maybe only implementing and benchmarking it can tell. However, it's worth noting that Netty has implemented ProtobufEncoder and ProtobufDecoder.

I think another advantage of this is that if we support this we also start supporting network streaming directly using a serialization format. To think further (immaturely), if we support Flows/Sequences in kotlinx.serialization, giving them the same status as Lists, we can just send and receive Flows and get a kind of a streaming protocol out of the box with serialization.

@ShreckYe
Copy link
Contributor

ShreckYe commented Mar 7, 2024

On second thought about this, I would say that my main concern is multiplatform support and Coroutines support is not that important at all. Serializing large messages is a rare use case anyway. And it's still arguable that even for a large message in a suspending context, is it faster to serialize it to a ByteArray first and then send it, or to wrap the sending operation in a SendChannel and pass it to the "suspend serialize function" which depends on smaller suspend ones? And since throughput is mainly a server-side concern, with JVM Loom / virtual threads gradually becoming stable and Coroutines integrating it, we can just run the blocking APIs on virtual threads and easily get a rough estimate of performance.

So to sum up, I think it would just be nice to use the Source and Sink APIs from kotlinx-io.

@GeorgePap-719 GeorgePap-719 changed the title Support Output/Input stream for protobuf format Support I/O stream for protobuf format Mar 13, 2024
@GeorgePap-719
Copy link
Contributor Author

GeorgePap-719 commented Mar 13, 2024

This probably could also be used by KTOR to support grpc, see: KTOR-1501 for more details.

My initial POC targeted only the jvm. To make it multiplatform, we would have to wait for kotlinx-io. On the other hand, Json-format supports streaming either through okio or targeting directly the jvm.

@shanshin shanshin self-assigned this Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants