-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thoughts on proposals for improvements on Gtid handling (parsing, adding to GtidSet) (performance, memory, cpu) #98
Comments
Can you give some numbers showing how expensive each issue you're calling out is? Whether to proceed on large changes like this really are just a matter of how much speed you'll win by doing so. As far as breaking the interface goes; generally, the answer is no. Ignore the lower version number, this library is stable and used heavily in production all over the place; I'm not quite sure of course who relies on the specific interfaces of regarding two different classes for maria/mysql: The mariaDB GTID support is more recent (and so could in theory break a little), but the interface as it stands can be used reasonably well without having to specify flags for connecting to maria vs mysql, which I like.
I like this idea, especially if we can keep the old interfaces the same but add a migration path to the new. I'm also not against marking the old interfaces as deprecated, if you come up with a great approach to organizing this information... but it'd probably be a long long migration to the new interfaces. |
I created a POC to compare the allocation and cpu profiles of the points 1) and 2). Point 3) is more of a design change imo. TL;DR The setup The POC app starts a MySql OneTimeServer with binary logs in gtid mode. It creates a table and populates it with 300000 rows. The total size of the binlog is around 130Mib. 1) The allocations caused by GtidEventDataDeserializer: First the allocation profile of the current binarylogclient: If instead we read the gtid in a With the allocations of the GtidEventDataDeserializer out of the picture, I also noticed that the EventHeaderV4Deserializer kept creating arrays for each event: When building and using an index once, these allocations can be avoided: The string formatting is also very visible in the cpu profile (the byteArrayToHex method): This is improved when we deserialize to a gtid object: 2) the cpu performance of the GtidSet The cpu usage of the original GtidSet.add method: When we use the gtid object instead (no string parsing): With a new GtidSet implementation: Some concluding thoughts While the new GtidSet is significantly faster than the current GtidSet, it probably does not matter because the current GtidSet.add method is only a fraction of the full cpu profile. So although I like my implementation, I don't think it would add much value. Sources You can find the source code to review and reproduce the profiling at https://github.com/janickr/mysql-binlog-connector-java/tree/gtd-profiling-poc Run the poc with It will generate the flamegraphs in the project root directory. Edit: new screenshots of a run with better JVM warmup |
Hi @osheroff
This is a proposal for a change, that I would like to discuss before working on it and submitting a PR.
The current code is not that efficient in the way mysql GTIDs are handled:
GtidEventDataDeserializer
:String.format
on each byteGtidSet
: when theBinaryLogClient
commits the Gtid to the gtidSet, it calls add(String)add(String gtid)
again parses the gtid string representation into a serverId and a sequence number to be added to the setWhen profiling an application that uses the binlog client this shows as a significant part of the allocation profile.
1) I think as a minimum these changes are needed to avoid the string operations:
MySqlGtid
class similar to theMariaGtid
class but containing aUUID
serverId (sourceId) and along
transactionId (sequence)GtidEventDataDeserializer
get the serverId as 2 longs (mostSignificantBits and LeastSignificantBits) and construct theMySqlGtid
object gtid
instead of aString gtid
in theBinaryLogClient
add(Object gtid)
method toGtidSet
andMariadbGtidSet
that casts the gtid to the correct subclass and adds it to the setGtidSet
change the<String,UUIDSet>map
to<UUID,UUIDSet>map
, and change the type of serverId in UUIDSet toUUID
This can probably be done keeping all existing public methods in place, only adding methods and delegating existing ones, so afaik without making breaking changes.
2) Additionally we could also improve the performance of the current GtidSet
I did some experiments and by making the common case fast (usually the next gtid is the increase of the sequence number of the same server)
The troughput of adding a gtid to a gtidset can be increased 4-fold compared with when we only make the changes in (1), and more than 60-fold compared to the original add method with the string.split (if the setup and interpretation of my jmh benchmarks is correct)
This change is a bit harder to perform without making changes to the interface of
GtidSet
orUUIDSet
(I think it's still possible, I'll probably have to actually try to keep the interface the same to see how it works out)3) We could also make a few changes to the design, but this comes at the cost of making some breaking changes
MariaDbGtidSet
inherits fromGtidSet
, but this is mainly becauseGtidSet
defines the interface for adding the gtid (uponcommitGtid
), it only uses part of the interface an none of the member variablesBinaryLogClient
that choose different behaviour for mysql and mariadb regarding gtid handling=> we could explore introducing some kind of (Abstract)TransactionState/GtidState/ReplicationState with 2 subclasses, one for Mysql and one for mariadb, encapsulation the difference in state and behaviour.
MariadbGtidSet and (Mysql)GtidSet could be two distinct/independent classes, gtid can be a member of those classes with it's respective
MariaDbGtid
orMysqlGtid
type, so there will be no need to useString
orObject
as the typeFor my use case the most important change is 1) then 2) then 3). I wanted to know your thoughts on this before I spend time on one or more PRs for this. Also I'm curious to know how important it is to keep the
GtidSet
,UUIDSet
andGtidEventData
interfaces stable given the version number is 0.27.6, because it takes more care to do so.What are your thoughts on this?
The text was updated successfully, but these errors were encountered: