Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to use python libs instead of parquet tools #5

Open
pokidovea opened this issue Apr 25, 2019 · 5 comments
Open

Proposal to use python libs instead of parquet tools #5

pokidovea opened this issue Apr 25, 2019 · 5 comments

Comments

@pokidovea
Copy link

It is not convenient to install Java-based parquet tools. There is at least one python lib for work with parquet pyarrow.
There are some advantages to use this lib:

  • no need to install Java and parquet-tools
  • possibility of editing parquet files

What do you think?

@yuj
Copy link
Owner

yuj commented Jun 14, 2022

I agree that it's a good idea to use a python lib to read in parquet files. Editing parquet files might be a bit inefficient using any text editor.

@dogversioning
Copy link
Contributor

dogversioning commented Dec 3, 2022

@yuj would you be interested in taking a PR to accomplish this? I tried to do this separately as a fork (https://github.com/dogversioning/sublime-parquet-python), which changes the rendering options (the python tools I used as a first pass don't support JSON output), but the sublime text folks have a light preference to consolidate these approaches if possible.

@yuj
Copy link
Owner

yuj commented Dec 3, 2022

@dogversioning PRs are always welcome! Please send it over.

Eventually I guess we all still prefer @pokidovea suggestion that uses pyarrow to read parquet files, instead of using parquet-tools. Anyone interested in accomplish that too? :)

@dogversioning
Copy link
Contributor

@yuj yeah, i think it makes sense - this was more of an incremental approach to solve an acute issue, but something like that was next on my list of things to potentially tackle.

Anyway, give me a bit to reconcile the fork approach with a in place one and i'll open a PR.

@dogversioning
Copy link
Contributor

dogversioning commented Jan 1, 2023

@yuj So I spent a little time this morning looking into this - there's some tradeoffs:

  • The two big parquet libs (fastparquet & pyarrow) require numpy, which is not available in python 3.3, so you'd have to run in python 3.8, which is only available in ST4 and later.
  • Dependency management is going to be an issue. Since neither of these are not bundled for sublime text, there are two possible pathways [1]:
    • Distributing pre-built dependencies per platform. Both of these have complex build chains touching libs requiring C++/Cython access, which would be a large engineering effort.
    • Asking a user to download python 3.8 and install a dependency, and then move/link it directly into Sublime Text's Lib folder.

If the first one doesn't bother you and you're ok with the hoops on the latter (I think for something of this scope the pre-built route isn't worth the effort), than it :could: be done. But it's an open question if this makes the barrier to entry too complex.

[1] https://stackoverflow.com/questions/61196270/how-to-properly-use-3rd-party-dependencies-with-sublime-text-plugins

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants