Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GT.pipe() #363

Merged
merged 6 commits into from
Dec 10, 2024
Merged

Implement GT.pipe() #363

merged 6 commits into from
Dec 10, 2024

Conversation

jrycw
Copy link
Collaborator

@jrycw jrycw commented May 24, 2024

Related to #353.

After reviewing the pipe() methods from Pandas and Polars, I propose adding GT.pipe() and GT.pipes() in this PR.

GT.pipe()

GT.pipe() will function similarly to Pandas and Polars, receiving a function along with optional positional and keyword arguments. An example is provided below:

import polars as pl
from great_tables import GT, loc, style
from great_tables.data import towny


towny_mini = pl.from_pandas(towny).head(10)
columns = ["land_area_km2", "density_2021"]
colors = ["lightgray", "lightblue"]


def tbl_style(gtbl: GT, columns: list[str], colors: list[str]) -> GT:
    for column, color in zip(columns, colors):
        gtbl = gtbl.tab_style(
            style=style.fill(color=color),
            locations=loc.body(columns=column, rows=pl.col(column).eq(pl.col(column).max())),
        )
    return gtbl


(
    GT(
        towny_mini[["name", "land_area_km2", "density_2021"]],
        rowname_col="name",
    ).pipe(tbl_style, columns, colors)
)

GT.pipes()

GT.pipe() serves as a convenient helper method for chaining multiple pre-built functions (or using partial). An example is provided below:

from functools import partial
import polars as pl
from great_tables import GT, loc, style
from great_tables.data import towny


towny_mini = pl.from_pandas(towny).head(10)
columns = ["land_area_km2", "density_2021"]
colors = ["lightgray", "lightblue"]


def tbl_style(gtbl: GT, column: str, color: str) -> GT:
    return gtbl.tab_style(
        style=style.fill(color=color),
        locations=loc.body(columns=column, rows=pl.col(column).eq(pl.col(column).max())),
    )


(
    GT(
        towny_mini[["name", "land_area_km2", "density_2021"]],
        rowname_col="name",
    ).pipes(
        *[partial(tbl_style, column=column, color=color) for column, color in zip(columns, colors)]
    )
)

Table

Both methods yield the table as follows:
image

Assistance Required

I hope the API design makes sense, and could the team please review my docs to ensure they render correctly? I'm relatively new to quarto.

@codecov-commenter
Copy link

codecov-commenter commented May 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.66%. Comparing base (c2acdbe) to head (2dafcc7).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #363      +/-   ##
==========================================
+ Coverage   90.64%   90.66%   +0.02%     
==========================================
  Files          45       46       +1     
  Lines        5355     5367      +12     
==========================================
+ Hits         4854     4866      +12     
  Misses        501      501              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jrycw
Copy link
Collaborator Author

jrycw commented May 24, 2024

I'm not sure if it's a good idea to support passing a list into pipes(). If we add two lines to pipes(), like:

def pipes(self: "GT", *funcs: Callable["GT", "GT"] | list[Callable["GT", "GT"]]) -> "GT":
    if isinstance(funcs[0], list) and len(funcs) == 1:  # edited
        funcs = funcs[0]
    for func in funcs:
        self = pipe(self, func)
    return self

Then, we can do:

(
    GT(
        towny_mini[["name", "land_area_km2", "density_2021"]],
        rowname_col="name",
    ).pipes(
        [partial(tbl_style, column=column, color=color) for column, color in zip(columns, colors)]
    )
)

@machow and @rich-iannone , could you share your thoughts on this idea with me?

@jrycw
Copy link
Collaborator Author

jrycw commented May 25, 2024

I've updated .pipes() to accept a list of functions in the latest commit. We can decide which commit to use later.

Copy link
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the wait! Thanks for taking the time to work through this, and look at what pandas and polars do. I'm noticing pandas and polars have only the one .pipe() method. Maybe we start with this, and then if people are asking for it, consider a .pipes() method?

Packages like toolz have a function---called compose()---that makes it easy to send multiple things to a single .pipe() method. For example...

from toolz.functoolz import compose, compose_left

from functools import partial
import polars as pl
from great_tables import GT, loc, style
from great_tables.data import towny


towny_mini = pl.from_pandas(towny).head(10)
columns = ["land_area_km2", "density_2021"]
colors = ["lightgray", "lightblue"]


def tbl_style(gtbl: GT, column: str, color: str) -> GT:
    return gtbl.tab_style(
        style=style.fill(color=color),
        locations=loc.body(columns=column, rows=pl.col(column).eq(pl.col(column).max())),
    )


(
    GT(
        towny_mini[["name", "land_area_km2", "density_2021"]],
        rowname_col="name",
    ).pipe(
        compose_left(*[partial(tbl_style, column=column, color=color) for column, color in zip(columns, colors)])
    )
)

@jrycw
Copy link
Collaborator Author

jrycw commented May 29, 2024

@machow thanks for introducing me to toolz. It's new to me, and it looks like it has a lot of great features. Sure, let's remove the .pipes() method. I've reviewed the preview docs and made some adjustments. Other than that, it looks good to me. What do you think?

Copy link
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking really good, I left a quick thought on how we might be able to annotate pipe using ParamSpec. But overall, it seems basically ready to go! @rich-iannone you'll probably know the right section for .pipe in the reference

from .gt import GT


def pipe(self: "GT", func: Callable[..., "GT"], *args: Any, **kwargs: Any) -> "GT":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could use ParamSpec here? Something like...

from typing import ParamSpec, Callable

P = ParamSpec("P")
def punt(f: Callable[P, int], *args: P.args, **kwargs: P.kwargs) -> int:
    return f(*args, **kwargs)


def add(x: int, y: int) -> int:
    return x + y

# flagged in static check as z is not an arg to add
punt(add, 1, y = 2, z = 3)

https://mypy-play.net/?mypy=latest&python=3.12&gist=23dd2d435071c6e9f5639cfcadf7dd16

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course! This is somewhat new to me, so please feel free to correct me if I've misunderstood anything.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@machow and @rich-iannone, this PR has been pending for a while. Could we schedule it for merging, or are there any remaining concerns?

@@ -174,6 +174,13 @@ quartodoc:
contents:
- GT.save
- GT.as_raw_html
- title: Pipeline
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about putting pipe in the helper functions section. @rich-iannone WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

@machow
Copy link
Collaborator

machow commented Dec 10, 2024

Sorry for the wait 😓, Rich and I are pairing right now -- let me try to get this merged right now

@machow machow self-requested a review December 10, 2024 19:45
Copy link
Collaborator

@machow machow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, @rich-iannone do you want to merge? I simplified the test a bit, and think the lil pipeline section in the reference seems okay as is!

@rich-iannone rich-iannone merged commit ceef5c5 into posit-dev:main Dec 10, 2024
13 of 14 checks passed
@rich-iannone
Copy link
Member

Thank you @jrycw !

@jrycw jrycw deleted the fix-353 branch December 11, 2024 01:07
@jrycw
Copy link
Collaborator Author

jrycw commented Dec 11, 2024

This LGTM, @rich-iannone do you want to merge? I simplified the test a bit, and think the lil pipeline section in the reference seems okay as is!

@machow, thanks for simplifying the test—it’s much more concise now.

@jrycw jrycw changed the title Implement GT.pipe() and GT.pipes() Implement GT.pipe() Dec 11, 2024
@jrycw
Copy link
Collaborator Author

jrycw commented Dec 11, 2024

@machow and @rich-iannone, I’ve removed GT.pipes() from the PR title since we didn’t end up merging that part into the codebase. This should help prevent potential confusion in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants