Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-2355: Remove parquet-thrift #1158

Closed
wants to merge 1 commit into from

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Sep 28, 2023

Parquet Thrift relies on Elephantbird:

https://github.com/twitter/elephant-bird

And the current version is still on Thrift 0.7.0, and this doesn't work with 0.19.0 that we're trying to upgrade to. Updating is non-trivial since a lot of the code that was used for testing has been made private: #1156

I would suggest removing this from the repository since it looks like no one is using it anymore:

apache-thrift does not seem to be used anymore (usages of 0 since 1.12.2):

image

Make sure you have checked all steps below.

Jira

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

Parquet Thrift relies on Elephantbird:

https://github.com/twitter/elephant-bird

And the current version is still on Thrift 0.7.0.
Updating is non-trivial since a lot of the code
that was used for testing has been made private:

apache#1156

I would suggest removing this from the repository
since it looks like no-on is using it anymore:

`apache-thrift` does not seem to be used anymore (usages of 0 since 1.12.2):
![image](https://github.com/apache/parquet-mr/assets/1134248/526b5358-ac4b-4b3d-82c5-5578ef110418)
@Fokko Fokko changed the title Remove parquet-thrift PARQUET-2355: Remove parquet-thrift Sep 28, 2023
@paliwalashish
Copy link

Do we want to make it deprecated now and remove in a major release?

@Fokko
Copy link
Contributor Author

Fokko commented Oct 15, 2023

@mystic-lama That's a possibility. I've sent out an email on the dev list with some more context: https://lists.apache.org/thread/bk74c7bk1wr63y5rw6tjwkxvdjjfb8kl The downside of keeping this is that we have to release with an old version of Thrift, which has challenges (you have to compile it locally for OSX). I've created a PR here to see what's changed. Also, bumping elephantbird is not a possibility, see #1156

@paliwalashish
Copy link

@Fokko I am following the discussion ☺️

I am all in favor of dropping it, just thinking from User perspective. Once they know the change is coming, they can prepare for it.

@Fokko
Copy link
Contributor Author

Fokko commented Oct 16, 2023

@mystic-lama That would be the cleanest way, but as I pointed out on the maillist, it has been almost 3 years since the latest PR has been raised against the parquet-thrift module. Those using it can still use the already-released versions.

@Fokko Fokko closed this Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants