Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-2422: Prevent unwrapping of Hadoop filestreams #1256

Merged
merged 1 commit into from
Feb 29, 2024

Conversation

rathinb-db
Copy link
Contributor

@rathinb-db rathinb-db commented Jan 20, 2024

Make sure you have checked all steps below.

Jira

Refactor Hadoop filestreams to prevent filestream unwrapping (and keeping the original filestream type).

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

Tests not necessary since this is a refactor.

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines
    from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Style

  • My contribution adheres to the code style guidelines and Spotless passes.
    • To apply the necessary changes, run mvn spotless:apply -Pvector-plugins

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

@wgtmac
Copy link
Member

wgtmac commented Jan 26, 2024

Could you please create a JIRA?

@rathinb-db rathinb-db changed the title Prevent unwrapping of Hadoop filestreams [PARQUET-2422] Prevent unwrapping of Hadoop filestreams Jan 26, 2024
Copy link
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks trivial to me. Thanks!

It would be good if @steveloughran @Fokko can double check as I'm not that familiar with its history.

@steveloughran
Copy link
Contributor

Not sure why there's a need to return a function rather than do the work -is there some other code which needs this?

In an ideal world parquet would be hadoop 3.3+ only and life would be simpler, not just here but with openFile(), ByteBufferPositionedReadable and more

@wgtmac
Copy link
Member

wgtmac commented Feb 6, 2024

@shangxinli @gszadovszky @ConeyLiu Do you have any comment?

Copy link
Contributor

@gszadovszky gszadovszky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me.

@@ -77,16 +78,17 @@ public static SeekableInputStream wrap(FSDataInputStream stream) {
* @param stream stream to probe
* @return A H2SeekableInputStream to access, or H1SeekableInputStream if the stream is not seekable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation should also be updated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rathinb-db Could you resolve this comment? Thanks!

Copy link
Contributor

@ConeyLiu ConeyLiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, just left a minor comment about doc.

@wgtmac wgtmac merged commit 0eec215 into apache:master Feb 29, 2024
9 checks passed
@wgtmac wgtmac changed the title [PARQUET-2422] Prevent unwrapping of Hadoop filestreams PARQUET-2422: Prevent unwrapping of Hadoop filestreams Feb 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants