select()

select() extracts columns of a data frame and returns the columns as a new data frame. To use select(), pass it the name of a data frame to extract columns from, and then the names of the columns to extract. The column names do not need to appear in quotation marks or be prefixed with a $; select() knows to find them in the data frame that you supply.

Exercise: select()

Use the example below to get a feel for select(). Can you extract just the name column? How about the name and year columns? How about all of the columns except prop?

select(babynames, name)
select(babynames, name, year)
select(babynames, year, sex, name, n)

select() helpers

You can also use a series of helpers with select(). For example, if you place a minus sign before a column name, select() will return every column but that column. Can you predict how the minus sign will work here?

The table below summarizes the other select() helpers that are available in {dplyr}. Study it, and then click “Continue” to test your understanding.

Helper function Use Example
- Columns except select(babynames, -prop)
: Columns between (inclusive) select(babynames, year:n)
contains() Columns that contains a string select(babynames, contains("n"))
ends_with() Columns that ends with a string select(babynames, ends_with("n"))
matches() Columns that matches a regex select(babynames, matches("n"))
num_range() Columns with a numerical suffix in the range Not applicable with babynames
one_of() Columns whose name appear in the given set select(babynames, one_of(c("sex", "gender")))
starts_with() Columns that starts with a string select(babynames, starts_with("n"))

select() quiz

Which of these is not a way to select the name and n columns together?





Next topic