The filter-mutate-keep approach requires that you are veryĮxplicit since it won’t ignore the rows that met the other conditions. Then will look at the next condition (and ignore the rows that already met theįirst condition). That is, case_when() will find the rows that fit the first condition, and It doesn’t move from one condition to the next in the same way case_when(). The only added burden of this approach is that This results in the same thing as theĬase_when() example. This filters by the condition and then assigns values to x_cat either This approach is unique to data.table and functions very similarly toĬase_when() in terms of syntax. Meet conditions 1 - 4, their value for this new variable we are making Values that don’t meet the conditions (so if someone in the data don’t With no real limit to the number of conditions that can be used. Instead it relies on the following syntax: case_when ( condition1 ~ if_condition1_true, condition2 ~ if_condition2_true, condition3 ~ if_condition3_true, condition4 ~ if_condition4_true ) dt %>% mutate ( x_cat = if_else ( x % as_tibble () If_else() statements, where the second is in the false place of theįirst. These can get messy, with many parenthases.įor example, with just two three levels, we now need to use two “low” but there is also a “moderate” level) then we use what is called More than 2 levels of the new variable (e.g., not just a “high” and Uses a unique syntax, but one that can avoid some issues. We could do this with data.table like below: dt dplyr::case_when()Ī newer, but fantastic, approach is using dplyr::case_when(). So with our example data, we can do: dt %>% mutate ( x_cat = if_else ( x > median ( x ), "high", "low" )) false is what is supposed to happen when the condition is false.true is what is supposed to happen when the condition is true.For example, weĬould do x > median(x) to test if each individual point of x is condition is something that can be true or false.The following general syntax: if_else ( condition, true, false ) Work the same way, but if_else() and fifelse() are more carefulĪbout variable types and fiflese() is super fast. base::ifelse(), dplyr::if_else(), and data.table::fiflese()īoth base::ifelse(), dplyr::if_else(), and data.table::fiflese() Below we walk through each approach to doing this. Let’s say we want to create a new variable that is categorizing our x dt <- data.table ( grp = factor ( sample ( 1L : 3L, 1e6, replace = TRUE )), x = rnorm ( 1e6 ), y = rnorm ( 1e6 ), z = sample ( c ( 1 : 10, NA ), 1e6, replace = TRUE ) ) dt We’ll also create a ficticious data set with four variables: grp, x, This is often done using base::ifelse(),ĭplyr::if_else(), dplyr::case_when(), or data.table::fiflese().īut it turns out there is another way to do this in data.table that isįor this post, we will use: # Core library ( dplyr ) library ( data.table ) library ( bench ) # Helpers library ( tidyr ) library ( ggbeeswarm ) Why do this? Well, we are often making or adjusting a variable based on If something meets a condition, do this else, do that. This very short post is presenting how one can perform As I’ve spent time learning about different approaches to working withĭata, I’ve seen several subtle, but important, differences in how to do
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |