Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grapher redesign #121

Open
7 of 16 tasks
adamerose opened this issue Mar 18, 2021 · 11 comments
Open
7 of 16 tasks

Grapher redesign #121

adamerose opened this issue Mar 18, 2021 · 11 comments

Comments

@adamerose
Copy link
Owner

adamerose commented Mar 18, 2021

Tracking major changes to the Grapher here, will edit this OP as things change.

How the Grapher works

  • There are functions that generate Plotly figures in pandasgui.jotly
  • The Grapher widget is defined in pandasgui.grapher and imports all those Jotly functions
  • The Grapher has a list of 'schemas' including an icon, name, and the Jotly function for each type of graph you can make in the Grapher, and has the UI to switch between those and display figures.
  • The actual drag-n-drop UI inside the Grapher is a FuncUi defined in pandasgui.widgets.func_ui and it takes a list of schemas then auto-generates the UI based on the type-hints of the function provided in the schema.

Here's how FuncUi maps the Jotly function type hints to PyQt widgets:

  • ColumnName (which is just a str) -> A textbox that you can type in or drop columns onto
  • Literal -> A combobox dropdown with options for the valid values defined in the type hint
  • bool -> A checkbox
  • ColumnNameList (which is List(str)) -> A list of multiple text boxes where you can drop column names and add/remove rows.

Todo

  • Added a console to display errors generating plots
  • Added a button Preview Kwargs to show your args as defined by the UI in a text box as a dict
  • Automatically re-render the plot any time the arguments are changed
  • Re-add the old logic like sorting and aggregating for line/bar plots and more args to jotly
  • Implement ColumnNameList for args that accept a list of column names
  • Heuristics to differentiate categorical and continuous numeric columns. Should add a color_mode arg like this, but add a third auto option that uses a heuristic.
  • Sync selections made between the Grapher and DataFrameExplorer. JMP does this and it's very cool: https://www.screencast.com/t/yKmLTaFaP9
  • Add argument facet_tab: interactive tabs you can click between
  • Add argument facet_wrap: splits into subplots like facet_row and facet_col but using a single variable and wraps them all into a square grid. This should be easy because plotly already provides facet_col_wrap, I will just make that number auto-calculated to give a grid.
  • Add argument facet_page: like facet_col but each subplot is full size and you can scroll through them. These were partly inspired by JMP. (page / wrap)
  • Allow opening Grapher to a specific state. So maybe you could type x = pandasgui.show_grapher(df, type='scatter', args={ 'x': 'age', 'y': 'fare', 'color':'survived''}) and it will pop open like this
  • Make Grapher a dialog instead of a Tab so you can have multiple Graphers open at once for a single DataFrame
  • Add checkbox to disable automatic re-rendering
  • Fix the GroupBox title styling
  • Automatic title generation
  • Default Grapher settings stored in preferences
adamerose added a commit that referenced this issue Mar 18, 2021
@fdion
Copy link
Contributor

fdion commented Mar 18, 2021

The auto doesn't detect if you have webgl support or not. It just automatically sets it above 1000 points. So, on corporate laptops that are locked down, or inside virtual machines, the moment you try to display > 1000 points, you get a webgl error. Setting render_mode to svg allows rendering all plots, even if they are > 1000 points, albeit with reduced performance. It seems up to 50,000 points displayed, it's fine on svg engine.

As to the rest, got my head spinning with all the changes. I'll have to digest this some.

adamerose added a commit that referenced this issue Mar 19, 2021
@fdion
Copy link
Contributor

fdion commented Mar 27, 2021

@adamerose need to know when you think things will settle down on the major refactoring so I can address the automated title/render mode if it is still a regression. I'd like to get to the point where a new pypi release can be made so it'll have the grapher splitter change .

And more generally, when it makes sense to address some of the other things, as I was waiting for the dust to settle a bit! :)

@fdion
Copy link
Contributor

fdion commented Mar 28, 2021

One way to handle all plotly settings would be to allow a kwarg dict to show/pandasgui: pass them all the way to jotly if they come from the initial call. Internally, you can handle that with either method you propose. Once in jotly, globally for all plots, any kwargs that start with layout_ are applied to update_layout, anything else to update_trace. And like you said, totally skip the UI.

This would be extremely flexible. New option in plotly? no problem, already supported.

@adamerose
Copy link
Owner Author

adamerose commented Mar 29, 2021

Pushed my changes so far. Things are mostly working and I put back in the title generation but it needs to be fixed up since I renamed apply_mean to aggregation and removed apply_sort (I think it's not needed). Render mode is no longer an option inside the Grapher it's just an option in Preferences and automatically applies to the Grapher where needed.

I deleted my previous comment since I changed my mind on each point I gave 😅

@fdion
Copy link
Contributor

fdion commented Mar 31, 2021

Just had to remove apply_sort from settings and rename apply_mean to aggregation and pass 'none' instead of False, and it started up.

Some feedback:

  • drag and drop to plot setup boxes seems easier to do than to drop on trees in the previous dragger setup
  • haven't dropped a variable outside the designated spots (would happen randomly with the tree setup)
  • nice touch adding the marginal, cumulative and trendline as UI options
  • drag and drop multiple variables on x or y etc is definitely needed, and it is on the TODO (ColumnNameList)
  • i see you don't have to click finish before it generates a plot, for small data sets not a problem, but for large (> 200,000 data points) it slows down the process a good bit. I see you have that also in the TODO (auto render checkbox)
  • x and y are inverted in the dragger
  • stuff like text, markers, hover_name etc missing.
  • still would need apply_sort for some sequential experiments data. Another approach: if apply_sort was instead sort and would take a column name(s) and sort by that?
  • I haven't had the chance to fully test the automated title
  • code export is completely missing

@fdion
Copy link
Contributor

fdion commented Apr 12, 2021

In case you are wondering, number 1 hurdle at this time is:

  • Implement ColumnNameList for args that accept a list of column names

@adamerose
Copy link
Owner Author

Yeah still haven't had time for that, will probably do it by this weekend

@adamerose
Copy link
Owner Author

adamerose commented Apr 20, 2021

Done. Only enabled it on Splom (scatter_matrix) and Word Cloud so far.

image

It'll probably take months to finish all the ideas in this thread - can you give a shortlist of what else you think needs to be done before putting out another release makes sense? I'm never in a hurry to put out new releases but I know you mentioned it above and can prioritize some things

@fdion
Copy link
Contributor

fdion commented Apr 20, 2021

Hey that's great. Let me look at this, I'll let you know.

@fdion
Copy link
Contributor

fdion commented May 3, 2021

I ended up packaging a wheel file with the pre-redesign code but with graph splitter, to buy some time, so not super urgent doing a release ATM.

I've been looking into the ColumnNameList, and not having much success getting it to work for X and Y on line. Looked at Wordcloud and SPLOM and it should just work, but get a NoneType error on set_names (func_ui).

@fdion
Copy link
Contributor

fdion commented May 17, 2021

Another PR for this ticket: #137

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants