-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconsider serialization approach #137
Comments
This is excellent @jasper-d we were discussing serialization of We've been also discussing with @caleblloyd as well and I think we're pretty much on the same page with you with your option 2. Some questions if I may add to the discussion:
It's be great if we can have a good discussion as a community and reach a consensus. |
Was thinking something along the lines of this- Get rid of:
Make a This way any of the Byte types will work without having to add an overload for each specific byte type. For chainable serializers, was thinking something like this:
|
I think there are two main design goals here:
[1] can be solved with overloads (targeting more types and/or using distinct method names). I'm not sure I like the idea of having bytes passing through a serializer to produce bytes, even if it is a no-op. It'll also be non-obvious (from the method signature) that it'll actually work. I'd vote to keep the Serializer option on NatsOps, as that has better ergonomics and discoverability than having to set or pass it separately. I'm also not convinced that there is a use case where someone would want to change their serializer for a connection after creating it (but if it does they can easily create a new connection instead). [2] As for performance, I think it is important to try to guide folks in the right direction. If the underlying code consuming the data has a preference (span, sequence, array, stream) that will be faster or less allocatey to use, then it should be easy for people to discover that this exists. That may not be easy if all the overloads have the same method name, but it can be mitigated with docs and code comments (for intellisense). However, if different names are acceptable it becomes possible to group them by behavior:
This makes it obvious that the methods have different behaviors (names can likely be improved though). Another option entirely, is to use a factory or builder to create the NatsConnection. Then if people specify .WithSerializer(...) they get a SerializingNatsConnection with appropriate methods and otherwise just get the raw connection without the generic overloads for PublishAsync. This is similar to how Pulsar does it (where you can specify a "schema" for your data). |
Overloading works on the So if we want to support the same byte types consistently on Publish and Subscribe in a similar manner, it will either need to be done with Separate Method names or via Generics for all of these methods:
|
Some good reading for potential byte return types in addition to https://learn.microsoft.com/en-us/dotnet/standard/memory-and-spans/memory-t-usage-guidelines |
I don't really have a preference here, as long as the end result is performant and does the expected thing :) |
see also #140 (comment) |
Sorry, I'm slow to respond. And apologies, this is going to become a wall of text.
Given that I never used the encoded connection in nats.net v1, I would vote for that. I find the global serializer is a extremely hard to discover API (it's not apparent from IntelliSense that it even exists). In addition, I don't think that the currently used default serializer can be made trim safe (c.f. #92).
I see great potential for concurrency issues and hard to debug bugs.
What encoding would StringSerializer use? I made some changes (incomplete, but core and js compile), just to get a better feeling for possible API changes. Serialization: Serialization is done using E.g. when serializing Protobuf (using Google's implementation) one would just need to pass String serialization would be just as simple: S.T.J is slightly more involved because of the disposable Returning void from the serializer (instead of the number of written bytes) makes it easier for clients to implement serialization delegates (neither Google Protobuf nor STJ serializers return the length). Instead, the length can be determined at the call-site of the serialization delegate. ByRef-like types such as If something like #140 (comment) is implemented, special handling for Deserialization Deserialization looks slightly different, mainly because I opted to put the serializer into (now generic) I don't see problems with return types, because the return type is The deserializer itself is Using However, it puts the burden of disposing the owner on the (client provided) deserializer, which I don't like. Other stuff:
|
I realize I'm quite late here, but I had also been bitten by the default Json serializer. Personally, I would prefer the connection and serialization to be divided into 2 parts:
This would make it more obvious what is happening and allow a cleaner API for each. I would think that most users are going to be working with either serialized object or bytes directly, not both. |
Just watching this great discussion from afar and wanted to jump in with a very specific nitpick:
There's an excellent blog post here that walks through some of the differences between ArrayPool and MemoryPool. The notable one for this discussion is MemoryPools hand you an IMemoryOwner, which is disposable while an ArrayPool straight up hands you an array you are expected to The MSDN link for ArrayPool.Rent makes this tradeoff clear:
You're correct that it will not leak memory, but it may leak performance when the intention of the API surface is to be performant. This comment isn't setting up an expectation of an allocation-free NATS client or anything. Just wanted to make sure the decisions made here are respecting the performance intent of these APIs 🙂 |
I have a few questions. (partly because of my lack of understanding -need to deep dive at some point, partly to shape the design)
(edit) I think I got it the wrong way round. When receiving messages (i.e. subscriptions) IMemoryOwner works fine. It's when publishing you need something like IBufferWriter (or some kind of sequence builder?) (edit2) An |
I think there are quite a lot, many of them private though. A public one I know is
I think for publishing it's straight forward. Much like now, serializers can write to an GC is only the stop-gap. It won't return the buffer, but eventually collect them just like any other ordinary memory that has no live references pointing to it.
I agree, it works perfectly fine. The only concern I have is that it works best only as long as clients consuming the memory (e.g. deserializing the buffer) are properly disposing it afterwards. On the other hand, clients which don't dispose an
Any implementation of For publishing, we can avoid it entirely and safely pool our buffers (i.e. For subscriptions we would need to instantiate a |
Please have a quick look at #171 for the proposed solution. |
cc: @renkman You might be interested in this too. |
What motivated this proposal?
Currently, publish and subscribe API is rather opaque in terms of serialization.
When publishing some already serialized data (e.g.
byte[]
,Memory<byte>
,ReadOnlyMemory<byte>
), clients have the following options:byte[]
as data toPublishAsync<T>()
ReadOnlySequence<byte>
and pass it toPublishAsync()
PublisAsync<T>()
and provide a custom serializer inNatsPubOpts
that can handle the type.The problem here is that code such as follows compiles, but fails at runtime:
That is because the generic overload of
PublishAsync
is chosen which then calls intoNatsJsonSerializer
which doesn't handle[ReadOnly]Memory<byte>
.byte[]
works because the JSON serializer apparently handles it, but it is at least weird to call into a JSON serializer to copy some byte array.See #136 for illustration of the problem.
What is the proposed change?
I think there are two options to improve the situation here:
PublishAsync
overload by forcing clients to provide anISerializer<T>
explicitly, removeNatsPubOpts.Serializer
, and have a single overload which handlesbyte[]
and[ReadOnly]Memory<byte>
, i.e.Personally, I'd prefer 2., since it avoids overload resolution issues in the future and makes it explicit that a serializer is required. It is also more AOT friendly.
NB: I have never really understood the motivation for having a connection wide serializer in NATS clients.
Subscriptions could use a similar approach.
Who benefits from this change?
Users wouldn't observe runtime exceptions for code that compiles. Possibly better performance when not invoking JSON serializer for
byte[]
.What alternatives have you evaluated?
No response
The text was updated successfully, but these errors were encountered: