The most popular names

Deriving information

Every data frame that you meet implies more information than it displays. For example, babynames does not display the total number of children who had your name, but babynames certainly implies what that number is. To discover the number, you only need to do a calculation:

babynames |> 
  filter(name == "Andrew", sex == "M") |> 
  summarize(total = sum(n))
# A tibble: 1 × 1
    total
    <int>
1 1283910

Useful functions

{dplyr} provides three functions that can help you reveal the information implied by your data:

  • summarize()
  • group_by()
  • mutate()

Like select(), filter() and arrange(), these functions all take a data frame as their first argument and return a new data frame as their output, which makes them easy to use in pipes.

Let’s master each function and use them to analyze popularity as we go.

Next topic