-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add WriteExceptionHandler #243
Conversation
@tabmatfournier this is just a draft to illustrate the idea but feel free to ask any questions here. |
Thanks. I'll take a look |
@Override | ||
public void write(SinkRecord sinkRecord, String tableName, boolean ignoreMissingTable) { | ||
try { | ||
underlying.write(sinkRecord, tableName, ignoreMissingTable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very broad net here which makes me uncomfortable.
In my PR there is a reason why I introduced CatalogAPI
wrapping a bunch of the calls to Iceberg / changes to IcebergWriterFactory where most of this is coming from. Is it schema failing? is it creating the table failing? is it evolving the table failure? Is it the catalog failing? is it the partitionspec failing? Much more fine grained control on the errors. This is catching a ton of errors from many sources and the opportunities for getting it wrong are large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can keep something like CatalogAPI
that throws clear Exception classes e.g. SchemaEvolveError
, TableCreateError
, PartitionEvolutionError
. I'm not opposed to that.
Having the broad net here is necessary to enable WriteExceptionHandler implementations. It's down to the WriteExceptionHandler implementation to decide which errors it wants to recover from and which it doesn't. I most certainly don't recommend sending everything that throws an Exception to the dead-letter-table as some of those could be transient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I raise a DeadLetterException wrapping the underlying exception in some cases. Again, it's not that easy/clean when you start putting it all together.
Not sure this is much of an improvement TBH. Very broad net. The real problem is you have a bunch of errors in the writer
Actually writing (to an appender) is not the issue. But the above logic is split between the WriterFactory and the writer itself. Might be worth pulling that out into distinct steps (catalogAPI does this for the first two cases, may be worth introducing a RecordConverter outside of the writer). It might make the wrapping you want more clear. |
This is nice but you also have to handle cases where you don't know the table it's going to (because the table name itself is invalid and the catalog throws) when trying to create it. It's not that simple. |
Current plan:
|
I don't want to gate keep this feature only for users who use dynamic-enabled. There is nothing here restricting other users from using this other than "simplifying code" (of course it's simplifying code --we are ignoring valid configurations). I believe this to be a deal breaker. |
If that is the case, why are we providing any implementations at all. IMHO we can't get away with just providing the interface for users. We have to provide some amount of default implementation. |
@Override | ||
public void write(SinkRecord sinkRecord, String tableName, boolean ignoreMissingTable) { | ||
try { | ||
underlying.write(sinkRecord, tableName, ignoreMissingTable); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will also cause infinite loops if you fail to create the dead letter table (possibly because the dead letter route has been set to an invalid value). Also an issue that this breaks due to whether or not autocreate tables is configured on.
Again, this is why CatalogAPI exists in the other API:
- it knows when you are creating / doing something with the dead letter table, so throw those errors instead of attempting to throw them into a dead letter table and do not catch the exception
- when you are doing something with a regular record/not dead letter table, then apply the error handler.
I can still do the above w/ the error handler, but you can't have this broad write
here --too many things downstream, some of which involve hard coded configs, some of which will involve the dead letter table --e.g. we support partitioning the dead letter table via the normal way to provide partition specs for any table, if that fails, you need to hard fail the connector.
The above would infinite loop in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will also cause infinite loops
I deliberately wrote it recursively. If users (which could be us, the maintainers!) feel an exception is transient, they're welcome to retry via the API, I don't feel we should restrict that option.
possibly because the dead letter route has been set to an invalid value
There is no dead-letter-table-route with this approach as far as the connector code is concerned.
The WriteExceptionHandler may certainly try to write to a dead-letter-table but that is a WriteExceptionHandler implementation detail.
I can still do the above w/ the error handler, but you can't have this broad write here --too many things downstream, some of which involve hard coded configs, some of which will involve the dead letter table --e.g. we support partitioning the dead letter table via the normal way to provide partition specs for any table, if that fails, you need to hard fail the connector.
With this approach, a dead-letter-table is like any other ordinary table so you can still "partitioning the dead letter table via the normal way to provide partition specs for any table."
if that fails, you need to hard fail the connector.
No problems, you can still achieve that here with the WriteExceptionHandler
approach. The WriteExceptionHandler
implementation just needs to ensure that if it sees a PartitionSpecEvolutionError
for what it considers to be a dead-letter-table, it should just hard-fail :)
Again, this is why CatalogAPI exists in the other API:
Like I said, I'm not opposed to having a concept like the Catalog API in the connector code if it makes it easier to react to clear and actionable exceptions.
Is this something we want? Remember: you will still need a default dead letter table because there are many cases where you don't know where to send the record to (error transform failures are a good example of this), so you are left to config value/fn of connector name. I would argue topic name is a poor choice but you could do that if you wanted. It is a poor choice because the user may be running several connectors for the same topic and using different SMTs. I'm not convinced you can just "write an appropriate handler to do it" because you often won't have any context for where to route to. |
Oh my bad, i threw this together pretty quickly. I thought most of the errors would only happen when we write SinkRecords to files but if there are errors we think it would be worth catching higher up (e.g. when creating a Writer), then we can certainly move the |
No problems, there are things we can do to mitigate this. |
It's not that easy. Unfortunately where you need to catch is a shotgun (worker, writer, writer factory, schemautils, etc.) |
Not opposed but I can show you how this is tricky/challenging in a screenshare. |
Absolutely. |
Scratch that, this should cover all errors coming from WriterFactory, IcebergWriter, and SchemaUtills as well? |
Adds a configuration option to specify a
WriteExceptionHandler
to use to handle any exceptions while writing SinkRecords to files.I haven't tested any of this yet :D but in theory:
SinkTask.put
would be captured by the user configuredWriteExceptionHandler
and handled there.SinkTask.put
), users should configure the connector iniceberg.tables.dynamic-enabled
with aiceberg.tables.route-field
and write an exception-handling-SMT that points to the the dead-letter-table.WriteExceptionHandler
and anExceptionHandlingSMT
that should work for 90% of usecases.Pros
byte[]
, you canWriteExceptionHandler
that sends bads record from different topics to different tables.SinkTaskContext.errantRecordReporter.report
APIWriteExceptionHandler
implementation)Cons
WriteExceptionHandler
implementationWriteExceptionHanlder
implementation, similar to how we provide sample SMT implementations alreadyWriteExceptionHandler
implementation is available on the classpathWriteExceptionHandler
implementations in the kafka-connect-iceberg project (but we should get some stable implementations first)How to
It might be a little hard to see how people could use this feature for specific use cases so I've prepared a separate draft PR to show this: #244