Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support to add new feature using python #69

Open
pipdax opened this issue Nov 27, 2020 · 6 comments
Open

Support to add new feature using python #69

pipdax opened this issue Nov 27, 2020 · 6 comments
Labels
enhancement New feature or request

Comments

@pipdax
Copy link

pipdax commented Nov 27, 2020

When I try to EDA, I usually create some new features to help me to analysis data.
For example, some columns has missing data, creating a new columns using 1 or 0 to identify that columns has missing or not.
I don't want to turn off the pandasgui, creating new column, show pandasgui again. This is complex

@pipdax pipdax added the enhancement New feature or request label Nov 27, 2020
@adamerose
Copy link
Owner

adamerose commented Nov 28, 2020

So if I understand, the problem is that you don't like needing to repeatedly call show to re-open the GUI every time you make changes to the original dataframe? There was some discussion of having modications you make in iPython also apply to the DataFrame in the GUI, but I decided against that and explain why here #20 (comment)

Do you have any specific solution in mind? All I can think of is maybe add a method to add or replace dataframes in an existing GUI window, so you would do gui = show(df), then modify the df in iPython, then gui.show(df) and it would overwrite it in the GUI

@JinchengWang
Copy link

JinchengWang commented Dec 7, 2020

Not sure if this is a good idea but here is a rough sketch of a proposal:

Add an option in the GUI to add a new column by specifying

  1. the name of the new column, and
  2. an expression that evaluates to the values of the new column (on the original dataframe)

Basically do something like this:

def add_new_column(self, new_column_name, new_column_expression):
    self.dataframe_original[new_column_name] = eval(new_column_expression, globals(), {self.name:self.dataframe_original})
    self.apply_filters_and_sorting()
    self.update()

Though I'm not sure if this still makes sense when filters and sorts are applied. Personally, this behavior still makes sense to me, but maybe there are people who would disagree, and expect the new column to only contain values in the filtered rows?

@adamerose
Copy link
Owner

adamerose commented Dec 8, 2020

Here's an API that might work, it just lets you modify your DataFrame as you normally would and then replace the one in the GUI with your result. I can't think of any limitations with this and it seems easier to work with then the add_new_column proposal

gui.replace("my_df_name", my_new_df)

So an example usage would be like this

from pandasgui import show
from pandasgui.datasets import pokemon

gui = show(pokemon)
gui.replace('pokemon', pokemon[pokemon.HP > 100])

Another thing I can do is use my scope sniffing magic (the same thing that get's the dataframe variable name into the GUI as a string) to find all dataframes in your scope and replace the GUI instances with those, you just need to keep the name the same.

from pandasgui import show
from pandasgui.datasets import pokemon

gui = show(pokemon)
pokemon = pokemon[pokemon.HP > 100]
gui.update_all()  # This would find a variable in your scope named 'pokemon' that is a DataFrame, and then replace the one in the GUI with the same name

@JinchengWang
Copy link

I'm worried that if this is a recommended workflow, it would not be compatible with editing data in the GUI, since the changes made in the GUI is not synced with the original dataframe.

For example, if I run

gui = show(pokemon)

then change the HP of Bulbasaur to 200 in the GUI, I would expect Bulbasaur to show up after executing

pokemon = pokemon[pokemon.HP > 100]
gui.update_all() 

@adamerose
Copy link
Owner

adamerose commented Dec 10, 2020

I'm worried that if this is a recommended workflow, it would not be compatible with editing data in the GUI, since the changes made in the GUI is not synced with the original dataframe.

For example, if I run

gui = show(pokemon)

then change the HP of Bulbasaur to 200 in the GUI, I would expect Bulbasaur to show up after executing

pokemon = pokemon[pokemon.HP > 100]
gui.update_all() 

Yeah you'll always need a method call to sync in either direction, because I don't want to automatically overwrite the original DataFrame due to reasons in the thread I linked. So have .get_dataframes() to get your GUI changes back into iPython and my proposed .replace() and .update_all() to get your iPython changes back into an existing GUI.

Your example would look like this

gui = show(pokemon)
# then change the HP of Bulbasaur to 200 in the GUI
pokemon = gui.get_dataframes()['pokemon']
pokemon = pokemon[pokemon.HP > 100]
gui.update_all() 

This is the least verbose API I can think of

@JinchengWang
Copy link

Your example would look like this

gui = show(pokemon)
# then change the HP of Bulbasaur to 200 in the GUI
pokemon = gui.get_dataframes()['pokemon']
pokemon = pokemon[pokemon.HP > 100]
gui.update_all() 

This is the least verbose API I can think of

Just thought of another idea:

Provide an IPython magic command to wrap this together. For the example above, allow the user to instead do something like

gui = show(pokemon)
# then change the HP of Bulbasaur to 200 in the GUI
%pdgui pokemon = pokemon[pokemon.HP > 100]

This could also make the history-tracking better, since both GUI operations and magic commands can be recorded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants