This function adds labels as attribute (named "labels"
)
to a variable or vector x
, resp. to a set of variables in a
data frame or a list-object. A use-case is, for instance, the
sjPlot-package, which supports labelled data and automatically
assigns labels to axes or legends in plots or to be used in tables.
val_labels()
is intended for use within pipe-workflows and has a
tidyverse-consistent syntax, including support for quasi-quotation
(see 'Examples').
set_labels(
x,
...,
labels,
force.labels = FALSE,
force.values = TRUE,
drop.na = TRUE
)
val_labels(x, ..., force.labels = FALSE, force.values = TRUE, drop.na = TRUE)
A vector or data frame.
For set_labels()
, Optional, unquoted names of variables that should be selected for
further processing. Required, if x
is a data frame (and no
vector) and only selected variables from x
should be processed.
You may also use functions like :
or tidyselect's
select-helpers.
For val_labels()
,
pairs of named vectors, where the name equals the variable name, which
should be labelled, and the value is the new variable label. val_labels()
also supports quasi-quotation (see 'Examples').
(Named) character vector of labels that will be added to x
as
"labels"
or "value.labels"
attribute.
if labels
is not a named vector, its length must equal the value range of x
, i.e. if x
has values from 1 to 3, labels
should have a length of 3;
if length of labels
is intended to differ from length of unique values of x
, a warning is given. You can still add missing labels with the force.labels
or force.values
arguments; see 'Note'.
if labels
is a named vector, value labels will be set accordingly, even if x
has a different length of unique values. See 'Note' and 'Examples'.
if x
is a data frame, labels
may also be a list
of (named) character vectors;
if labels
is a list
, it must have the same length as number of columns of x
;
if labels
is a vector and x
is a data frame, labels
will be applied to each column of x
.
Use labels = ""
to remove labels-attribute from x
.
Logical; if TRUE
, all labels
are added as value label
attribute, even if x
has less unique values then length of labels
or if x
has a smaller range then length of labels
. See 'Examples'.
This parameter will be ignored, if labels
is a named vector.
Logical, if TRUE
(default) and labels
has less
elements than unique values of x
, additional values not covered
by labels
will be added as label as well. See 'Examples'.
This parameter will be ignored, if labels
is a named vector.
Logical, whether existing value labels of tagged NA values
(see tagged_na
) should be removed (drop.na = TRUE
,
the default) or preserved (drop.na = FALSE
).
See get_na
for more details on tagged NA values.
x
with value label attributes; or with removed label-attributes if
labels = ""
. If x
is a data frame, the complete data
frame x
will be returned, with removed or added to variables
specified in ...
; if ...
is not specified, applies
to all variables in the data frame.
if labels
is a named vector, force.labels
and force.values
will be ignored, and only values defined in labels
will be labelled;
if x
has less unique values than labels
, redundant labels will be dropped, see force.labels
;
if x
has more unique values than labels
, only matching values will be labelled, other values remain unlabelled, see force.values
;
If you only want to change partial value labels, use add_labels
instead.
Furthermore, see 'Note' in get_labels
.
See vignette Labelled Data and the sjlabelled-Package
for more details; set_label
to manually set variable labels or
get_label
to get variable labels; add_labels
to
add additional value labels without replacing the existing ones.
dummy <- sample(1:4, 40, replace = TRUE)
frq(dummy)
#> x <integer>
#> # total N=40 valid N=40 mean=2.75 sd=1.10
#>
#> Value | N | Raw % | Valid % | Cum. %
#> -------------------------------------
#> 1 | 8 | 20 | 20 | 20
#> 2 | 6 | 15 | 15 | 35
#> 3 | 14 | 35 | 35 | 70
#> 4 | 12 | 30 | 30 | 100
#> <NA> | 0 | 0 | <NA> | <NA>
dummy <- set_labels(dummy, labels = c("very low", "low", "mid", "hi"))
frq(dummy)
#> x <integer>
#> # total N=40 valid N=40 mean=2.75 sd=1.10
#>
#> Value | Label | N | Raw % | Valid % | Cum. %
#> ------------------------------------------------
#> 1 | very low | 8 | 20 | 20 | 20
#> 2 | low | 6 | 15 | 15 | 35
#> 3 | mid | 14 | 35 | 35 | 70
#> 4 | hi | 12 | 30 | 30 | 100
#> <NA> | <NA> | 0 | 0 | <NA> | <NA>
# assign labels with named vector
dummy <- sample(1:4, 40, replace = TRUE)
dummy <- set_labels(dummy, labels = c("very low" = 1, "very high" = 4))
frq(dummy)
#> x <integer>
#> # total N=40 valid N=40 mean=2.48 sd=1.04
#>
#> Value | Label | N | Raw % | Valid % | Cum. %
#> -------------------------------------------------
#> 1 | very low | 8 | 20.00 | 20.00 | 20.00
#> 2 | 2 | 13 | 32.50 | 32.50 | 52.50
#> 3 | 3 | 11 | 27.50 | 27.50 | 80.00
#> 4 | very high | 8 | 20.00 | 20.00 | 100.00
#> <NA> | <NA> | 0 | 0.00 | <NA> | <NA>
# force using all labels, even if not all labels
# have associated values in vector
x <- c(2, 2, 3, 3, 2)
# only two value labels
x <- set_labels(x, labels = c("1", "2", "3"))
#> More labels than values of "x". Using first 2 labels.
x
#> [1] 2 2 3 3 2
#> attr(,"labels")
#> 1 2
#> 2 3
frq(x)
#> x <numeric>
#> # total N=5 valid N=5 mean=2.40 sd=0.55
#>
#> Value | Label | N | Raw % | Valid % | Cum. %
#> --------------------------------------------
#> 2 | 1 | 3 | 60 | 60 | 60
#> 3 | 2 | 2 | 40 | 40 | 100
#> <NA> | <NA> | 0 | 0 | <NA> | <NA>
# all three value labels
x <- set_labels(x, labels = c("1", "2", "3"), force.labels = TRUE)
x
#> [1] 2 2 3 3 2
#> attr(,"labels")
#> 1 2 3
#> 1 2 3
frq(x)
#> x <numeric>
#> # total N=5 valid N=5 mean=2.40 sd=0.55
#>
#> Value | Label | N | Raw % | Valid % | Cum. %
#> --------------------------------------------
#> 1 | 1 | 0 | 0 | 0 | 0
#> 2 | 2 | 3 | 60 | 60 | 60
#> 3 | 3 | 2 | 40 | 40 | 100
#> <NA> | <NA> | 0 | 0 | <NA> | <NA>
# create vector
x <- c(1, 2, 3, 2, 4, NA)
# add less labels than values
x <- set_labels(x, labels = c("yes", "maybe", "no"), force.values = FALSE)
#> "x" has more values than "labels", hence not all values are labelled.
x
#> [1] 1 2 3 2 4 NA
#> attr(,"labels")
#> yes maybe no
#> 1 2 3
# add all necessary labels
x <- set_labels(x, labels = c("yes", "maybe", "no"), force.values = TRUE)
#> More values in "x" than length of "labels". Additional values were added to labels.
x
#> [1] 1 2 3 2 4 NA
#> attr(,"labels")
#> yes maybe no 4
#> 1 2 3 4
# set labels and missings
x <- c(1, 1, 1, 2, 2, -2, 3, 3, 3, 3, 3, 9)
x <- set_labels(x, labels = c("Refused", "One", "Two", "Three", "Missing"))
x
#> [1] 1 1 1 2 2 -2 3 3 3 3 3 9
#> attr(,"labels")
#> Refused One Two Three Missing
#> -2 1 2 3 9
set_na(x, na = c(-2, 9))
#> [1] 1 1 1 2 2 NA 3 3 3 3 3 NA
#> attr(,"labels")
#> One Two Three
#> 1 2 3
x <- labelled(
c(1:3, tagged_na("a", "c", "z"), 4:1),
c("Agreement" = 1, "Disagreement" = 4, "First" = tagged_na("c"),
"Refused" = tagged_na("a"), "Not home" = tagged_na("z"))
)
# get current NA values
x
#> <labelled<double>[10]>
#> [1] 1 2 3 NA(a) NA(c) NA(z) 4 3 2 1
#>
#> Labels:
#> value label
#> 1 Agreement
#> 4 Disagreement
#> NA(c) First
#> NA(a) Refused
#> NA(z) Not home
get_na(x)
#> First Refused Not home
#> NA NA NA
# lose value labels from tagged NA by default, if not specified
set_labels(x, labels = c("New Three" = 3))
#> <labelled<double>[10]>
#> [1] 1 2 3 NA(a) NA(c) NA(z) 4 3 2 1
#>
#> Labels:
#> value label
#> 3 New Three
# do not drop na
set_labels(x, labels = c("New Three" = 3), drop.na = FALSE)
#> <labelled<double>[10]>
#> [1] 1 2 3 NA(a) NA(c) NA(z) 4 3 2 1
#>
#> Labels:
#> value label
#> 3 New Three
#> NA(c) First
#> NA(a) Refused
#> NA(z) Not home
# set labels via named vector,
# not using all possible values
data(efc)
get_labels(efc$e42dep)
#> [1] "independent" "slightly dependent" "moderately dependent"
#> [4] "severely dependent"
x <- set_labels(
efc$e42dep,
labels = c(`independent` = 1,
`severe dependency` = 2,
`missing value` = 9)
)
get_labels(x, values = "p")
#> [1] "[1] independent" "[2] severe dependency" "[9] missing value"
get_labels(x, values = "p", non.labelled = TRUE)
#> [1] "[1] independent" "[2] severe dependency" "[3] 3"
#> [4] "[4] 4" "[9] missing value"
# labels can also be set for tagged NA value
# create numeric vector
x <- c(1, 2, 3, 4)
# set 2 and 3 as missing, which will automatically set as
# tagged NA by 'set_na()'
x <- set_na(x, na = c(2, 3))
x
#> [1] 1 NA NA 4
# set label via named vector just for tagged NA(3)
set_labels(x, labels = c(`New Value` = tagged_na("3")))
#> [1] 1 NA NA 4
#> attr(,"labels")
#> New Value
#> NA
# setting same value labels to multiple vectors
dummies <- data.frame(
dummy1 = sample(1:4, 40, replace = TRUE),
dummy2 = sample(1:4, 40, replace = TRUE),
dummy3 = sample(1:4, 40, replace = TRUE)
)
# and set same value labels for two of three variables
test <- set_labels(
dummies, dummy1, dummy2,
labels = c("very low", "low", "mid", "hi")
)
# see result...
get_labels(test)
#> $dummy1
#> [1] "very low" "low" "mid" "hi"
#>
#> $dummy2
#> [1] "very low" "low" "mid" "hi"
#>
#> $dummy3
#> NULL
#>
# using quasi-quotation
if (require("rlang") && require("dplyr")) {
dummies <- data.frame(
dummy1 = sample(1:4, 40, replace = TRUE),
dummy2 = sample(1:4, 40, replace = TRUE),
dummy3 = sample(1:4, 40, replace = TRUE)
)
x1 <- "dummy1"
x2 <- c("so low", "rather low", "mid", "very hi")
dummies %>%
val_labels(
!!x1 := c("really low", "low", "a bit mid", "hi"),
dummy3 = !!x2
) %>%
get_labels()
# ... and named vectors to explicitly set value labels
x2 <- c("so low" = 4, "rather low" = 3, "mid" = 2, "very hi" = 1)
dummies %>%
val_labels(
!!x1 := c("really low" = 1, "low" = 3, "a bit mid" = 2, "hi" = 4),
dummy3 = !!x2
) %>% get_labels(values = "p")
}
#> $dummy1
#> [1] "[1] really low" "[2] a bit mid" "[3] low" "[4] hi"
#>
#> $dummy2
#> NULL
#>
#> $dummy3
#> [1] "[1] very hi" "[2] mid" "[3] rather low" "[4] so low"
#>