[Feature Request][Spark] Auto Compaction shouldn't be trigged if compaction hasn't been run yet #4043
Open
2 of 8 tasks
Labels
enhancement
New feature or request
Feature request
The OSS implementation will run compaction when auto-compaction is enabled if compaction hasn't been run yet. I.e. running a CTAS w/ the table property enabled will perform compaction after the write even if the small file count doesn't meet minNumFiles.
delta/spark/src/main/scala/org/apache/spark/sql/delta/stats/AutoCompactPartitionStats.scala
Lines 71 to 78 in a920885
Auto compaction in Databricks does not perform this unnecessary initial compaction operation. It should only be evaluated based on the presence of small files which meet or exceed the minNumFiles.
Which Delta project/connector is this regarding?
Overview
Motivation
Improve performance of tables that get created with auto compaction enabled.
Further details
Willingness to contribute
The text was updated successfully, but these errors were encountered: