Rounding

Overplotting due to rounding

If your overplotting is due to rounding, you can obtain a better picture of the data by making each point semi-transparent. For example you could set the alpha aesthetic of the plot below to a value less than one, which will make the points transparent.

Try this now. Set the points to an alpha of 0.25, which will make each point 25% opaque (i.e. four points staked on top of each other will create a solid black).

Hint: Make sure you set alpha = 0.25 outside of aes().

ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy), alpha = 0.25)

Good job! You can now identify which values contain more observations. The darker locations contain several points stacked on top of each other.

Adjust the position

A second strategy for dealing with rounding is to adjust the position of each point. position = "jitter" adds a small amount of random noise to the location of each point. Since the noise is random, it is unlikely that two points rounded to the same location will also be jittered to the same location.

The result is a jittered plot that displays more of the data. Jittering comes with both limitations and benefits. You cannot use a jittered plot to see the local values of the points, but you can use a jittered plot to perceive the global relationship between the variables, something that is hard to do in the presence of overplotting.

ggplot(data = mpg) +
  geom_point(mapping = aes(x = displ, y = hwy), position = "jitter")

Review: jitter

In the Scatterplots tutorial, you learned of a geom that displays the equivalent of geom_point() with a position = "jitter" adjustment.

Rewrite the code below to use that geom. Do you obtain similar results?

ggplot(data = mpg) +
  geom_jitter(mapping = aes(x = displ, y = hwy))

Good job! Now let’s look at ways to handle overplotting due to large datasets.

Next topic