filter()

filter() extracts rows from a data frame and returns them as a new data frame. As with select(), the first argument of filter() should be a data frame to extract rows from. The arguments that follow should be logical tests; filter() will return every row for which the tests return TRUE.

filter() in action

For example, the code chunk below returns every row with the name “Sea” in babynames.

filter(babynames, name == "Sea")
# A tibble: 4 Ă— 5
   year sex   name      n       prop
  <dbl> <chr> <chr> <int>      <dbl>
1  1982 F     Sea       5 0.00000276
2  1985 M     Sea       6 0.00000312
3  1986 M     Sea       5 0.0000026 
4  1998 F     Sea       5 0.00000258

Logical tests

To get the most from filter, you will need to know how to use R’s logical test operators, which are summarized below.

Logical operator tests Example
> Is x greater than y? x > y
>= Is x greater than or equal to y? x >= y
< Is x less than y? x < y
<= Is x less than or equal to y? x <= y
== Is x equal to y? x == y
!= Is x not equal to y? x != y
is.na() Is x an NA? is.na(x)
!is.na() Is x not an NA? !is.na(x)

Exercise: Logical operators

See if you can use the logical operators to manipulate our code below to show:

  • All of the names where prop is greater than or equal to 0.08
  • All of the children named “Khaleesi”
  • All of the names that have a missing value for n (Hint: this should return an empty data set).
filter(babynames, prop >= 0.08)
filter(babynames, name == "Khaleesi")
filter(babynames, is.na(n))

Two common mistakes

When you use logical tests, be sure to look out for two common mistakes. One appears in each code chunk below. Can you find them? When you spot a mistake, fix it and then run the chunk to confirm that it works.

filter(babynames, name == "Sea")

Good job! Remember to use == instead of = when testing for equality.

filter(babynames, name == "Sea")

Good job! As written this code would check that name is equal to the contents of the object named Sea, which does not exist.

Two mistakes: Recap

When you use logical tests, be sure to look out for these two common mistakes:

  1. Using = instead of == to test for equality.
  2. Forgetting to use quotation marks when comparing strings, e.g. name == Abby, instead of name == "Abby"

Combining tests

If you provide more than one test to filter(), filter() will combine the tests with an and statement (&): it will only return the rows that satisfy all of the tests.

To combine multiple tests in a different way, use R’s Boolean operators. For example, the code below will return all of the children named Sea or Anemone.

filter(babynames, name == "Sea" | name == "Anemone")
# A tibble: 5 Ă— 5
   year sex   name        n       prop
  <dbl> <chr> <chr>   <int>      <dbl>
1  1982 F     Sea         5 0.00000276
2  1985 M     Sea         6 0.00000312
3  1986 M     Sea         5 0.0000026 
4  1998 F     Sea         5 0.00000258
5  2012 F     Anemone     6 0.0000031 

Boolean operators

You can find a complete list or base R’s Boolean operators in the table below.

Boolean operator represents Example
& Are both A and B true? A & B
| Are one or both of A and B true? A | B
! Is A not true? !A
xor() Is one and only one of A and B true? xor(A, B)
%in% Is x in the set of a, b, and c? x %in% c(a, b, c)
any() Are any of A, B, or C true? any(A, B, C)
all() Are all of A, B, or C true? all(A, B, C)

Exercise: Combining tests

Use Boolean operators to alter the code chunk below to return only the rows that contain:

  • Girls named Sea
  • Names that were used by exactly 5 or 6 children in 1880
  • Names that are one of Acura, Lexus, or Yugo
filter(babynames, name == "Sea", sex == "F")
filter(babynames, n == 5 | n == 6, year == 1880)
filter(babynames, name %in% c("Acura", "Lexus", "Yugo"))

Two more common mistakes

Logical tests also invite two common mistakes that you should look out for. Each is displayed in a code chunk below, one produces an error and the other is needlessly verbose. Diagnose the chunks and then fix the code.

filter(babynames, 10 < n, n < 20)

Good job! You cannot combine two logical tests in R without using a Boolean operator (or at least a comma between filter arguments).

filter(babynames, n %in% c(5, 6, 7, 8, 9))

Good job! Although the first code works, you should make your code more concise by collapsing multiple or statements into an %in% statement when possible.

Two more common mistakes: Recap

When you combine multiple logical tests, be sure to look out for these two common mistakes:

  1. Collapsing multiple logical tests into a single test without using a boolean operator
  2. Using repeated | instead of %in%, e.g. x == 1 | x == 2 | x == 3 instead of x %in% c(1, 2, 3)

Next topic