filter() extracts rows from a data frame and returns them as a new data frame. As with select(), the first argument of filter() should be a data frame to extract rows from. The arguments that follow should be logical tests; filter() will return every row for which the tests return TRUE.
filter() in action
For example, the code chunk below returns every row with the name “Sea” in babynames.
filter(babynames, name =="Sea")
# A tibble: 4 Ă— 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1982 F Sea 5 0.00000276
2 1985 M Sea 6 0.00000312
3 1986 M Sea 5 0.0000026
4 1998 F Sea 5 0.00000258
Logical tests
To get the most from filter, you will need to know how to use R’s logical test operators, which are summarized below.
Logical operator
tests
Example
>
Is x greater than y?
x > y
>=
Is x greater than or equal to y?
x >= y
<
Is x less than y?
x < y
<=
Is x less than or equal to y?
x <= y
==
Is x equal to y?
x == y
!=
Is x not equal to y?
x != y
is.na()
Is x an NA?
is.na(x)
!is.na()
Is x not an NA?
!is.na(x)
Exercise: Logical operators
See if you can use the logical operators to manipulate our code below to show:
All of the names where prop is greater than or equal to 0.08
All of the children named “Khaleesi”
All of the names that have a missing value for n (Hint: this should return an empty data set).
filter(babynames, prop >=0.08)filter(babynames, name =="Khaleesi")filter(babynames, is.na(n))
Two common mistakes
When you use logical tests, be sure to look out for two common mistakes. One appears in each code chunk below. Can you find them? When you spot a mistake, fix it and then run the chunk to confirm that it works.
Good job! As written this code would check that name is equal to the contents of the object named Sea, which does not exist.
Two mistakes: Recap
When you use logical tests, be sure to look out for these two common mistakes:
Using = instead of == to test for equality.
Forgetting to use quotation marks when comparing strings, e.g. name == Abby, instead of name == "Abby"
Combining tests
If you provide more than one test to filter(), filter() will combine the tests with an and statement (&): it will only return the rows that satisfy all of the tests.
To combine multiple tests in a different way, use R’s Boolean operators. For example, the code below will return all of the children named Sea or Anemone.
filter(babynames, name =="Sea"| name =="Anemone")
# A tibble: 5 Ă— 5
year sex name n prop
<dbl> <chr> <chr> <int> <dbl>
1 1982 F Sea 5 0.00000276
2 1985 M Sea 6 0.00000312
3 1986 M Sea 5 0.0000026
4 1998 F Sea 5 0.00000258
5 2012 F Anemone 6 0.0000031
Boolean operators
You can find a complete list or base R’s Boolean operators in the table below.
Boolean operator
represents
Example
&
Are bothA and B true?
A & B
|
Are one or both of A and B true?
A | B
!
Is Anot true?
!A
xor()
Is one and only one of A and B true?
xor(A, B)
%in%
Is x in the set of a, b, and c?
x %in% c(a, b, c)
any()
Are any of A, B, or C true?
any(A, B, C)
all()
Are all of A, B, or C true?
all(A, B, C)
Exercise: Combining tests
Use Boolean operators to alter the code chunk below to return only the rows that contain:
Girls named Sea
Names that were used by exactly 5 or 6 children in 1880
filter(babynames, name =="Sea", sex =="F")filter(babynames, n ==5| n ==6, year ==1880)filter(babynames, name %in%c("Acura", "Lexus", "Yugo"))
Two more common mistakes
Logical tests also invite two common mistakes that you should look out for. Each is displayed in a code chunk below, one produces an error and the other is needlessly verbose. Diagnose the chunks and then fix the code.
Good job! Although the first code works, you should make your code more concise by collapsing multiple or statements into an %in% statement when possible.
Two more common mistakes: Recap
When you combine multiple logical tests, be sure to look out for these two common mistakes:
Collapsing multiple logical tests into a single test without using a boolean operator
Using repeated | instead of %in%, e.g. x == 1 | x == 2 | x == 3 instead of x %in% c(1, 2, 3)