This function replaces specific values of variables with NA
.
set_na(x, ..., na, drop.levels = TRUE, as.tag = FALSE)
A vector or data frame.
Optional, unquoted names of variables that should be selected for
further processing. Required, if x
is a data frame (and no
vector) and only selected variables from x
should be processed.
You may also use functions like :
or tidyselect's select-helpers.
See 'Examples'.
Numeric vector with values that should be replaced with NA values,
or a character vector if values of factors or character vectors should be
replaced. For labelled vectors, may also be the name of a value label. In
this case, the associated values for the value labels in each vector
will be replaced with NA
. na
can also be a named vector.
If as.tag = FALSE
, values will be replaced only in those variables
that are indicated by the value names (see 'Examples').
Logical, if TRUE
, factor levels of values that have
been replaced with NA
are dropped. See 'Examples'.
Logical, if TRUE
, values in x
will be replaced
by tagged_na
, else by usual NA
values. Use a named
vector to assign the value label to the tagged NA value (see 'Examples').
x
, with all values in na
being replaced by NA
.
If x
is a data frame, the complete data frame x
will
be returned, with NA's set for variables specified in ...
;
if ...
is not specified, applies to all variables in the
data frame.
set_na()
converts all values defined in na
with
a related NA
or tagged NA value (see tagged_na()
).
Tagged NA
s work exactly like regular R missing values
except that they store one additional byte of information: a tag,
which is usually a letter ("a" to "z") or character number ("0" to "9").
Different NA values for different variables
If na
is a named vector and as.tag = FALSE
, the names
indicate variable names, and the associated values indicate those values
that should be replaced by NA
in the related variable. For instance,
set_na(x, na = c(v1 = 4, v2 = 3))
would replace all 4 in v1
with NA
and all 3 in v2
with NA
.
If na
is a named list and as.tag = FALSE
, it is possible
to replace different multiple values by NA
for different variables
separately. For example, set_na(x, na = list(v1 = c(1, 4), v2 = 5:7))
would replace all 1 and 4 in v1
with NA
and all 5 to 7 in
v2
with NA
.
Furthermore, see also 'Details' in get_na
.
Labels from values that are replaced with NA and no longer used will be
removed from x
, however, other value and variable label
attributes are preserved. For more details on labelled data,
see vignette Labelled Data and the sjlabelled-Package.
if (require("sjmisc") && require("dplyr") && require("haven")) {
# create random variable
dummy <- sample(1:8, 100, replace = TRUE)
# show value distribution
table(dummy)
# set value 1 and 8 as missings
dummy <- set_na(dummy, na = c(1, 8))
# show value distribution, including missings
table(dummy, useNA = "always")
# add named vector as further missing value
set_na(dummy, na = c("Refused" = 5), as.tag = TRUE)
# see different missing types
print_tagged_na(set_na(dummy, na = c("Refused" = 5), as.tag = TRUE))
# create sample data frame
dummy <- data.frame(var1 = sample(1:8, 100, replace = TRUE),
var2 = sample(1:10, 100, replace = TRUE),
var3 = sample(1:6, 100, replace = TRUE))
# set value 2 and 4 as missings
dummy %>% set_na(na = c(2, 4)) %>% head()
dummy %>% set_na(na = c(2, 4), as.tag = TRUE) %>% get_na()
dummy %>% set_na(na = c(2, 4), as.tag = TRUE) %>% get_values()
data(efc)
dummy <- data.frame(
var1 = efc$c82cop1,
var2 = efc$c83cop2,
var3 = efc$c84cop3
)
# check original distribution of categories
lapply(dummy, table, useNA = "always")
# set 3 to NA for two variables
lapply(set_na(dummy, var1, var3, na = 3), table, useNA = "always")
# if 'na' is a named vector *and* 'as.tag = FALSE', different NA-values
# can be specified for each variable
set.seed(1)
dummy <- data.frame(
var1 = sample(1:8, 10, replace = TRUE),
var2 = sample(1:10, 10, replace = TRUE),
var3 = sample(1:6, 10, replace = TRUE)
)
dummy
# Replace "3" in var1 with NA, "5" in var2 and "6" in var3
set_na(dummy, na = c(var1 = 3, var2 = 5, var3 = 6))
# if 'na' is a named list *and* 'as.tag = FALSE', for each
# variable different multiple NA-values can be specified
set_na(dummy, na = list(var1 = 1:3, var2 = c(7, 8), var3 = 6))
# drop unused factor levels when being set to NA
x <- factor(c("a", "b", "c"))
x
set_na(x, na = "b", as.tag = TRUE)
set_na(x, na = "b", drop.levels = FALSE, as.tag = TRUE)
# set_na() can also remove a missing by defining the value label
# of the value that should be replaced with NA. This is in particular
# helpful if a certain category should be set as NA, however, this category
# is assigned with different values accross variables
x1 <- sample(1:4, 20, replace = TRUE)
x2 <- sample(1:7, 20, replace = TRUE)
x1 <- set_labels(x1, labels = c("Refused" = 3, "No answer" = 4))
x2 <- set_labels(x2, labels = c("Refused" = 6, "No answer" = 7))
tmp <- data.frame(x1, x2)
get_labels(tmp)
table(tmp, useNA = "always")
get_labels(set_na(tmp, na = "No answer"))
table(set_na(tmp, na = "No answer"), useNA = "always")
# show values
tmp
set_na(tmp, na = c("Refused", "No answer"))
}
#> [1] NA(5) 3 6 6 2 4 7 4 NA(5) NA(5) NA(5) NA
#> [13] NA(5) 4 NA 2 4 3 NA 6 6 2 3 7
#> [25] NA 3 NA NA NA NA(5) NA 4 3 NA 2 NA
#> [37] NA NA NA 6 2 6 NA 3 NA(5) NA NA 6
#> [49] NA(5) 2 NA 6 NA 4 6 6 NA 3 3 3
#> [61] NA NA 3 NA 6 2 2 NA(5) NA(5) NA 2 2
#> [73] NA NA 4 NA 3 NA NA(5) NA(5) NA 2 3 2
#> [85] NA NA NA NA(5) 6 7 NA(5) 2 4 7 2 NA(5)
#> [97] 4 7 2 2
#> x1 x2
#> 1 NA 4
#> 2 NA 1
#> 3 1 NA
#> 4 NA NA
#> 5 NA 2
#> 6 2 NA
#> 7 2 3
#> 8 2 2
#> 9 2 NA
#> 10 NA NA
#> 11 NA 2
#> 12 NA 5
#> 13 NA 2
#> 14 2 NA
#> 15 NA NA
#> 16 1 NA
#> 17 NA 1
#> 18 2 3
#> 19 1 3
#> 20 NA NA