Histograms

Introduction

How to make a histogram

To make a histogram with {ggplot2}, add geom_histogram() to the ggplot2 template. For example, the code below plots a histogram of the carat variable in the diamonds dataset, which comes with {ggplot2}.

Interactive editor

The \(y\) variable

As with geom_bar(), you do not need to give geom_histogram() a \(y\) variable. geom_histogram() will construct its own \(y\) variable by counting the number of observations that fall into each bin on the \(x\) axis. geom_histogram() will then map the counts to the \(y\) axis.

As a result, you can glance at a bar to determine how many observations fall within a bin. Bins with tall bars highlight common values of the \(x\) variable.

Exercise 1: Interpretation

According to the chart, which is the most common carat size in the data?

`binwidth`

By default, {ggplot2} will choose a binwidth for your histogram that results in about 30 bins. You can set the binwidth manually with the binwidth argument, which is interpreted in the units of the x axis:

Interactive editor

`bins`

Alternatively, you can set the binwidth with the bins argument which takes the total number of bins to use:

Interactive editor

It can be hard to determine what the actual binwidths are when you use bins, since they may not be round numbers.

`boundary`

You can move the bins left and right along the \(x\) axis with the boundary argument. boundary takes an \(x\) value to use as the boundary between two bins ({ggplot2} will align the rest of the bins accordingly):

Interactive editor

Exercise 2: `binwidth`

When you use geom_histogram(), you should always experiment with different binwidths because different size bins reveal different types of information.

To see an example of this, make a histogram of the carat variable in the diamonds dataset. Use a bin size of 0.5 carats. What does the overall shape of the distribution look like?

Interactive editor
Solution

ggplot(data = diamonds) +
  geom_histogram(mapping = aes(x = carat), binwidth = 0.5)

Good job! The most common diamond size is about 0.5 carats. Larger sizes become progressively less frequent as carat size increases. This accords with general knowledge about diamonds, so you may be prompted to stop exploring the distribution of carat size. But should you?

Exercise 3: another `binwidth`

Recreate your histogram of carat but this time use a binwidth of 0.1. Does your plot reveal new information? Look closely. Is there more than one peak? Where do the peaks occur?

Interactive editor
Solution

ggplot(data = diamonds) +
  geom_histogram(mapping = aes(x = carat), binwidth = 0.1)

Good job! The new binwidth reveals a new phenomena: carat sizes like 0.5, 0.75, 1, 1.5, and 2 are much more common than carat sizes that do not fall near a common fraction. Why might this be?

Exercise 4: another `binwidth`

Recreate your histogram of carat a final time, but this time use a binwidth of 0.01 and set the first boundary to zero. Try to find one new pattern in the results.

Interactive editor
Solution

ggplot(data = diamonds) +
  geom_histogram(mapping = aes(x = carat), binwidth = 0.01, boundary = 0)

Good job! The new binwidth reveals another phenomena: each peak is very right skewed. In other words, diamonds that are 1.01 carats are much more common than diamonds that are .99 carats. Why would that be?

Aesthetics

Visually, histograms are very similar to bar charts. As a result, they use the same aesthetics: alpha, color, fill, linetype, and size.

They also behave in the same odd way when you use the color aesthetic. Do you remember what happens?

Which aesthetic would you use to color the interior fill of each bar in a histogram?

Exercise 5: Putting it all together

Recreate the histogram below.

Interactive editor
Solution

ggplot(data = diamonds) +
  geom_histogram(mapping = aes(x = price, fill = cut), binwidth = 1000, boundary = 0)

Good job! Did you ensure that each binwidth is 1000 and that the first boundary is zero?

Next topic

Introduction

How to make a histogram

The \(y\) variable

Exercise 1: Interpretation

binwidth

bins

boundary

Exercise 2: binwidth

Exercise 3: another binwidth

Exercise 4: another binwidth

Aesthetics

Exercise 5: Putting it all together

`binwidth`

`bins`

`boundary`

Exercise 2: `binwidth`

Exercise 3: another `binwidth`

Exercise 4: another `binwidth`