Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

n legend with single value 1 when using geom_bar(stat = "sum") #4848

Open
stragu opened this issue May 17, 2022 · 4 comments · May be fixed by #6281
Open

n legend with single value 1 when using geom_bar(stat = "sum") #4848

stragu opened this issue May 17, 2022 · 4 comments · May be fixed by #6281

Comments

@stragu
Copy link
Contributor

stragu commented May 17, 2022

Using stat = "sum" in geom_bar() displays a count legend labelled "n" with a grey square as a default. It is particularly out of place when the bars are filled, resulting in what seems to be two fill legends, the grey not matching anything on the plot.

The issue has been brought up on StackOverflow, with a workaround: https://stackoverflow.com/questions/50378718/what-is-the-n-1-box-in-my-r-geom-bar-legend-and-how-do-i-remove

library(ggplot2)
exp_df <- data.frame(x = c("A", "B", "B", "C"),
           value = 1:4,
           group = c("Z", "Z", "Y", "Y"))
ggplot(exp_df, aes(x, value)) +
  geom_bar(stat = "sum")

# with fill
ggplot(exp_df, aes(x, value, fill = group)) +
  geom_bar(stat = "sum")

Created on 2022-05-17 by the reprex package (v2.0.1)

Should geom_bar() not display this legend at all?

@yutannihilation
Copy link
Member

The reason is after_stat(n) is mapped to size (probably because this stat_sum() was primarily intended to be used with geom_point()). I'm not sure if we can remove n by default for this case, but I agree it would be nice.

library(ggplot2)
exp_df <- data.frame(x = c("A", "B", "B", "C"),
           value = 1:4,
           group = c("Z", "Z", "Y", "Y"))
ggplot(exp_df, aes(x, value, size = NULL)) +
  geom_bar(stat = "sum")

Created on 2022-05-18 by the reprex package (v2.0.1)

@stragu
Copy link
Contributor Author

stragu commented Jun 1, 2022

Thanks, @yutannihilation. I now realise that creating a "summed" bar chart gives the illusion that it does what one might want it to do, i.e. pre-process the data by summing the values by group, and creating a col chart using those summed values as the y value.

When in reality, it does nothing of the sort. It just creates a stacked col chart:

library(ggplot2)
exp_df <- data.frame(x = c("A", "B", "B", "C"),
                     value = 1:4,
                     group = c("Z", "Z", "Y", "Y"))
# "summed" bar chart is not actually summed...
ggplot(exp_df, aes(x, value)) +
  geom_bar(stat = "sum", colour = "red")

# ... it's just a stacked col chart (with thicker outlines)
ggplot(exp_df, aes(x, value)) +
  geom_col(colour = "red")

However, if a pair of x and y values is repeated, the size of the outline does change – as expected if one knows that the sum stat affects the size aesthetic according to repeats of x and y pairs:

exp_df2 <- data.frame(x = c("A", "B", "B", "C"),
                     value = c(1,2,2,3),
                     group = c("Z", "Z", "Y", "Y"))
ggplot(exp_df2, aes(x, value)) +
  geom_bar(stat = "sum", colour = "red")

What I am actually after is using the stat_summary() function with fun = "sum":

ggplot(exp_df, aes(x, value)) +
  stat_summary(geom = "bar", fun = "sum", colour = "red")

Created on 2022-06-01 by the reprex package (v2.0.1)

My understanding now is that the current behaviour is expected, however the name of the stat is misleading. I assume this is not something that will likely change (and I suspect the clash with the name of the count stat has been previously discussed), so maybe some additions in the documentation could help?

Something like:

In the geom_count Description

(or in "Details")

[...] The "sum" stat computes an n variable equal to how many elements share the same values for all aesthetic other than size. n is by default mapped to the size aesthetic of the geometry. Especially when used with a geometry other than "point", for example when replacing stat = "count" with stat = "sum" in geom_bar(), this stat may be misunderstood as one computing sums of grouped x or y values.

In the geom_bar Details

Using the "sum" stat in a "bar" or "col" geometry will not sum the grouped x or y values. This stat is intended to be used with the "point" geometry, and varies the size aesthetic when there are exact overlaps.
To create a bar chart of summed values, pre-process the dataframe before feeding if to ggplot2 functions, or use stat_summary() with fun = "sum".

What do you think?

@yutannihilation
Copy link
Member

Sorry for replying late. I have no idea. I too think the documentation can be improved on this, but I'm not sure if it goes to the help page of each function. I feel it's a higher topic, but I don't come up with the right place for this...

@teunbrand
Copy link
Collaborator

In my opinion, the size aesthetic shouldn't be displayed in a bar geom as it uses linewidth. It gets displayed here because it gets automatically translated from size to linewidth due to 3.4.0 renaming fallback. I think this fallback should be turned off for geom_bar() for it is causing troubles like this.

@teunbrand teunbrand linked a pull request Jan 13, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants