This function adds labels as attribute (named "labels")
to a variable or vector x, resp. to a set of variables in a
data frame or a list-object. A use-case is, for instance, the
sjPlot-package, which supports labelled data and automatically
assigns labels to axes or legends in plots or to be used in tables.
val_labels() is intended for use within pipe-workflows and has a
tidyverse-consistent syntax, including support for quasi-quotation
(see 'Examples').
set_labels(
x,
...,
labels,
force.labels = FALSE,
force.values = TRUE,
drop.na = TRUE
)
val_labels(x, ..., force.labels = FALSE, force.values = TRUE, drop.na = TRUE)A vector or data frame.
For set_labels(), Optional, unquoted names of variables that should be selected for
further processing. Required, if x is a data frame (and no
vector) and only selected variables from x should be processed.
You may also use functions like : or tidyselect's
select-helpers.
For val_labels(),
pairs of named vectors, where the name equals the variable name, which
should be labelled, and the value is the new variable label. val_labels()
also supports quasi-quotation (see 'Examples').
(Named) character vector of labels that will be added to x as
"labels" or "value.labels" attribute.
if labels is not a named vector, its length must equal the value range of x, i.e. if x has values from 1 to 3, labels should have a length of 3;
if length of labels is intended to differ from length of unique values of x, a warning is given. You can still add missing labels with the force.labels or force.values arguments; see 'Note'.
if labels is a named vector, value labels will be set accordingly, even if x has a different length of unique values. See 'Note' and 'Examples'.
if x is a data frame, labels may also be a list of (named) character vectors;
if labels is a list, it must have the same length as number of columns of x;
if labels is a vector and x is a data frame, labels will be applied to each column of x.
Use labels = "" to remove labels-attribute from x.
Logical; if TRUE, all labels are added as value label
attribute, even if x has less unique values then length of labels
or if x has a smaller range then length of labels. See 'Examples'.
This parameter will be ignored, if labels is a named vector.
Logical, if TRUE (default) and labels has less
elements than unique values of x, additional values not covered
by labels will be added as label as well. See 'Examples'.
This parameter will be ignored, if labels is a named vector.
Logical, whether existing value labels of tagged NA values
(see tagged_na) should be removed (drop.na = TRUE,
the default) or preserved (drop.na = FALSE).
See get_na for more details on tagged NA values.
x with value label attributes; or with removed label-attributes if
labels = "". If x is a data frame, the complete data
frame x will be returned, with removed or added to variables
specified in ...; if ... is not specified, applies
to all variables in the data frame.
if labels is a named vector, force.labels and force.values will be ignored, and only values defined in labels will be labelled;
if x has less unique values than labels, redundant labels will be dropped, see force.labels;
if x has more unique values than labels, only matching values will be labelled, other values remain unlabelled, see force.values;
If you only want to change partial value labels, use add_labels instead.
Furthermore, see 'Note' in get_labels.
See vignette Labelled Data and the sjlabelled-Package
for more details; set_label to manually set variable labels or
get_label to get variable labels; add_labels to
add additional value labels without replacing the existing ones.
dummy <- sample(1:4, 40, replace = TRUE)
frq(dummy)
#> x <integer>
#> # total N=40 valid N=40 mean=2.75 sd=1.10
#>
#> Value | N | Raw % | Valid % | Cum. %
#> -------------------------------------
#> 1 | 8 | 20 | 20 | 20
#> 2 | 6 | 15 | 15 | 35
#> 3 | 14 | 35 | 35 | 70
#> 4 | 12 | 30 | 30 | 100
#> <NA> | 0 | 0 | <NA> | <NA>
dummy <- set_labels(dummy, labels = c("very low", "low", "mid", "hi"))
frq(dummy)
#> x <integer>
#> # total N=40 valid N=40 mean=2.75 sd=1.10
#>
#> Value | Label | N | Raw % | Valid % | Cum. %
#> ------------------------------------------------
#> 1 | very low | 8 | 20 | 20 | 20
#> 2 | low | 6 | 15 | 15 | 35
#> 3 | mid | 14 | 35 | 35 | 70
#> 4 | hi | 12 | 30 | 30 | 100
#> <NA> | <NA> | 0 | 0 | <NA> | <NA>
# assign labels with named vector
dummy <- sample(1:4, 40, replace = TRUE)
dummy <- set_labels(dummy, labels = c("very low" = 1, "very high" = 4))
frq(dummy)
#> x <integer>
#> # total N=40 valid N=40 mean=2.48 sd=1.04
#>
#> Value | Label | N | Raw % | Valid % | Cum. %
#> -------------------------------------------------
#> 1 | very low | 8 | 20.00 | 20.00 | 20.00
#> 2 | 2 | 13 | 32.50 | 32.50 | 52.50
#> 3 | 3 | 11 | 27.50 | 27.50 | 80.00
#> 4 | very high | 8 | 20.00 | 20.00 | 100.00
#> <NA> | <NA> | 0 | 0.00 | <NA> | <NA>
# force using all labels, even if not all labels
# have associated values in vector
x <- c(2, 2, 3, 3, 2)
# only two value labels
x <- set_labels(x, labels = c("1", "2", "3"))
#> More labels than values of "x". Using first 2 labels.
x
#> [1] 2 2 3 3 2
#> attr(,"labels")
#> 1 2
#> 2 3
frq(x)
#> x <numeric>
#> # total N=5 valid N=5 mean=2.40 sd=0.55
#>
#> Value | Label | N | Raw % | Valid % | Cum. %
#> --------------------------------------------
#> 2 | 1 | 3 | 60 | 60 | 60
#> 3 | 2 | 2 | 40 | 40 | 100
#> <NA> | <NA> | 0 | 0 | <NA> | <NA>
# all three value labels
x <- set_labels(x, labels = c("1", "2", "3"), force.labels = TRUE)
x
#> [1] 2 2 3 3 2
#> attr(,"labels")
#> 1 2 3
#> 1 2 3
frq(x)
#> x <numeric>
#> # total N=5 valid N=5 mean=2.40 sd=0.55
#>
#> Value | Label | N | Raw % | Valid % | Cum. %
#> --------------------------------------------
#> 1 | 1 | 0 | 0 | 0 | 0
#> 2 | 2 | 3 | 60 | 60 | 60
#> 3 | 3 | 2 | 40 | 40 | 100
#> <NA> | <NA> | 0 | 0 | <NA> | <NA>
# create vector
x <- c(1, 2, 3, 2, 4, NA)
# add less labels than values
x <- set_labels(x, labels = c("yes", "maybe", "no"), force.values = FALSE)
#> "x" has more values than "labels", hence not all values are labelled.
x
#> [1] 1 2 3 2 4 NA
#> attr(,"labels")
#> yes maybe no
#> 1 2 3
# add all necessary labels
x <- set_labels(x, labels = c("yes", "maybe", "no"), force.values = TRUE)
#> More values in "x" than length of "labels". Additional values were added to labels.
x
#> [1] 1 2 3 2 4 NA
#> attr(,"labels")
#> yes maybe no 4
#> 1 2 3 4
# set labels and missings
x <- c(1, 1, 1, 2, 2, -2, 3, 3, 3, 3, 3, 9)
x <- set_labels(x, labels = c("Refused", "One", "Two", "Three", "Missing"))
x
#> [1] 1 1 1 2 2 -2 3 3 3 3 3 9
#> attr(,"labels")
#> Refused One Two Three Missing
#> -2 1 2 3 9
set_na(x, na = c(-2, 9))
#> [1] 1 1 1 2 2 NA 3 3 3 3 3 NA
#> attr(,"labels")
#> One Two Three
#> 1 2 3
x <- labelled(
c(1:3, tagged_na("a", "c", "z"), 4:1),
c("Agreement" = 1, "Disagreement" = 4, "First" = tagged_na("c"),
"Refused" = tagged_na("a"), "Not home" = tagged_na("z"))
)
# get current NA values
x
#> <labelled<double>[10]>
#> [1] 1 2 3 NA(a) NA(c) NA(z) 4 3 2 1
#>
#> Labels:
#> value label
#> 1 Agreement
#> 4 Disagreement
#> NA(c) First
#> NA(a) Refused
#> NA(z) Not home
get_na(x)
#> First Refused Not home
#> NA NA NA
# lose value labels from tagged NA by default, if not specified
set_labels(x, labels = c("New Three" = 3))
#> <labelled<double>[10]>
#> [1] 1 2 3 NA(a) NA(c) NA(z) 4 3 2 1
#>
#> Labels:
#> value label
#> 3 New Three
# do not drop na
set_labels(x, labels = c("New Three" = 3), drop.na = FALSE)
#> <labelled<double>[10]>
#> [1] 1 2 3 NA(a) NA(c) NA(z) 4 3 2 1
#>
#> Labels:
#> value label
#> 3 New Three
#> NA(c) First
#> NA(a) Refused
#> NA(z) Not home
# set labels via named vector,
# not using all possible values
data(efc)
get_labels(efc$e42dep)
#> [1] "independent" "slightly dependent" "moderately dependent"
#> [4] "severely dependent"
x <- set_labels(
efc$e42dep,
labels = c(`independent` = 1,
`severe dependency` = 2,
`missing value` = 9)
)
get_labels(x, values = "p")
#> [1] "[1] independent" "[2] severe dependency" "[9] missing value"
get_labels(x, values = "p", non.labelled = TRUE)
#> [1] "[1] independent" "[2] severe dependency" "[3] 3"
#> [4] "[4] 4" "[9] missing value"
# labels can also be set for tagged NA value
# create numeric vector
x <- c(1, 2, 3, 4)
# set 2 and 3 as missing, which will automatically set as
# tagged NA by 'set_na()'
x <- set_na(x, na = c(2, 3))
x
#> [1] 1 NA NA 4
# set label via named vector just for tagged NA(3)
set_labels(x, labels = c(`New Value` = tagged_na("3")))
#> [1] 1 NA NA 4
#> attr(,"labels")
#> New Value
#> NA
# setting same value labels to multiple vectors
dummies <- data.frame(
dummy1 = sample(1:4, 40, replace = TRUE),
dummy2 = sample(1:4, 40, replace = TRUE),
dummy3 = sample(1:4, 40, replace = TRUE)
)
# and set same value labels for two of three variables
test <- set_labels(
dummies, dummy1, dummy2,
labels = c("very low", "low", "mid", "hi")
)
# see result...
get_labels(test)
#> $dummy1
#> [1] "very low" "low" "mid" "hi"
#>
#> $dummy2
#> [1] "very low" "low" "mid" "hi"
#>
#> $dummy3
#> NULL
#>
# using quasi-quotation
if (require("rlang") && require("dplyr")) {
dummies <- data.frame(
dummy1 = sample(1:4, 40, replace = TRUE),
dummy2 = sample(1:4, 40, replace = TRUE),
dummy3 = sample(1:4, 40, replace = TRUE)
)
x1 <- "dummy1"
x2 <- c("so low", "rather low", "mid", "very hi")
dummies %>%
val_labels(
!!x1 := c("really low", "low", "a bit mid", "hi"),
dummy3 = !!x2
) %>%
get_labels()
# ... and named vectors to explicitly set value labels
x2 <- c("so low" = 4, "rather low" = 3, "mid" = 2, "very hi" = 1)
dummies %>%
val_labels(
!!x1 := c("really low" = 1, "low" = 3, "a bit mid" = 2, "hi" = 4),
dummy3 = !!x2
) %>% get_labels(values = "p")
}
#> $dummy1
#> [1] "[1] really low" "[2] a bit mid" "[3] low" "[4] hi"
#>
#> $dummy2
#> NULL
#>
#> $dummy3
#> [1] "[1] very hi" "[2] mid" "[3] rather low" "[4] so low"
#>