-
-
Notifications
You must be signed in to change notification settings - Fork 55
Type Encoding
- You're using Ceras as a network protocol?
- Or just trying to make serialization more efficient? ⚡
Then you're looking for config.KnownTypes
!
Add your types to config.KnownTypes
like this:
SerializerConfig config = new SerializerConfig();
config.KnownTypes.Add(typeof(MyLoginPacket));
config.KnownTypes.Add(typeof(MyChatMessagePacket));
...
var ceras = new CerasSerializer(config);
Sometimes variables and their values can have different types. (scroll to example below). In that case Ceras has to write the Type so that it can later know how to read the data.
When a Type that needs to be written can be found in the KnownTypes
collection Ceras can emit a single byte instead of a huge string (the full name of the type)!
That's especially interesting for networking scenarios (in fact KnownTypes was implemented for exactly this reason) because that way you can completely avoid sending any strings, making the communication as efficient as a hand-crafted binary network protocol!
But even if you only serialize stuff to write it to a file, you can still save a little space and time by using KnownTypes.
- Adding your types to
KnownTypes
is completely optional! Everything will work perfectly fine even if you don't do it. However doing so will save space and improve performance!
No! Don't just throw in everything! Only add types when you know that Ceras will eventually need to encode them.
Adding types that will never be written (because they can always be inferred from the containing variable) will just reduce efficiency (even if just a little).
If you are unsure what that means then scroll down to read about how and when Ceras writes a Type
As for security (in networking scenarios), adding more Types has no effect on vulnerability. For more information about security take a look at the wiki page.
- When you send your serialized packets over the internet, the receiving side will never know what kind of packet comes next.
- Now you obviously can't have
ceras.Deserialize<ChatMessagePacket>(...)
since you don't know what the other side has sent. - So on the receiving side you'll have
ceras.Deserialize<object>(...)
orceras.Deserialize<MyNetworkPacketBase>(...)
. - That means the sending side can't just do
ceras.Serialize(myChatMsgPacket);
, because that would actually callceras.Serialize<ChatMessagePacket>(myChatMessagePacket);
, and doing that would write the packet in a format that expects you to know the type beforehand (which is impossible). - So on the sending side you'll have
ceras.Serialize<MyNetworkPacketBase>(...)
or similar...
From here it should be clear that Ceras needs to write the type of the root-object. Of course we don't want a long type-name to be written everytime we send a network message, instead we want a tiny ID (as short as possible!).
So adding all your packets / network-messages to KnownTypes
will essentially turn Ceras from a plain old serializer into an optimized network-protocol!
- Since the ID for a type is based on its index in
KnownTypes
, it is essential that Server and Client have the exact same types in the exact same order. - In order to easily verify that, Ceras can calculate a "protocol hash" based on the whole
SerializerConfig
. - The idea is that the very first message the client sends to the server after connecting is the 4-byte (a single int) protocol hash.
- The server compares the received hash with its own, and if it doesn't match it can just kick/disconnect the client. That way it is ensured that the server and client are fully compatible.
When there is a need to write a type, Ceras tries 3 things:
- Check if in the current Serialize call that type was already written. If so we can just write a short backreference (1 byte)
- Maybe the the type is listed in
KnownTypes
? If so, only an ID derived from the index is written (1 byte) - Seems like the user didn't put the type in
KnownTypes
and it's the first time we've seen this type. There's no other way than just writing the name of the type.
Even though writing a type is, even in the worst case, basically just writing a string, it's still overhead that Ceras tries very hard to prevent:
Most of the time the type of values match their containing variables type already, so we don't even get to the question of how a type should be written. The fastest way to write something is just not writing anything in the first place :P
As seen in step 2, a type only gets written once per Serialize-call.
So if you have a List<ISpell>
containing 100 fireballs, Ceras will only write Namespace.Fireball
once, and from there on it can reference the already written name. This saves a ton of space.
Generic types can be exploited by deconstructing them and writing their fragments individually.
Assuming we had to write a type like List<ISpell>
, a naive implementation would just write "System.Collections.Generic.List<Namespace.ISpell>"
.
Ceras will deconstruct generic types and write each primitive type individually, writing: "System.Collections.Generic.List<>"
+ "Namespace.ISpell"
.
That way future types can be constructed from those individual fragements!
Should we later encounter a Queue<ISpell>
as well, only "Queue<>"
has to be written since ISpell
can already be encoded as a backreference.
Only when the variable and its value have different types.
class Thing { public object Obj; }
var t = new Thing();
Data | Type written? |
t.Obj = null; |
No - null is a special case |
t.Obj = new object(); |
No - types of field and contained object match |
t.Obj = 5; |
Yes (System.Int32 ) |
t.Obj = "abc"; |
Yes (System.String ) |
interface ISpell { ... }
class Fireball : ISpell { int Damage; }
class ChainLightning : ISpell { int InitialDamage; int JumpCount; ... }
Given the definitions above, lets assume you'd write something like:
var mySpells = List<ISpell>() { new ChainLightning(), new Fireball(), new Fireball(), ... };
var data = ceras.Serialize(mySpells);
When you want to your list again, Ceras needs to know the type of each individual entry in the list.
Otherwise how would it know whether to read one or two ints?? After all the data is packed as tightly as possible and contains only the bare minimum.
If you want to keep the type names but maybe just want to shorten or otherwise customize them, you can implement ITypeBinder
and use it in your SerializerConfig.
Implementing the interface is super simple, all it is doing is converting a given Type
to a string
(which will be written to the binary) and the same thing in reverse (finding/resolving a Type
from a given string
).
If you're not sure about something you can always take a look at the source code (seeing how the default TypeBinder is implemented) or just open an issue and ask 😄