This function adds labels as attribute (named "labels") to a variable or vector x, resp. to a set of variables in a data frame or a list-object. A use-case is, for instance, the sjPlot-package, which supports labelled data and automatically assigns labels to axes or legends in plots or to be used in tables. val_labels() is intended for use within pipe-workflows and has a tidyverse-consistent syntax, including support for quasi-quotation (see 'Examples').

set_labels(
  x,
  ...,
  labels,
  force.labels = FALSE,
  force.values = TRUE,
  drop.na = TRUE
)

val_labels(x, ..., force.labels = FALSE, force.values = TRUE, drop.na = TRUE)

Arguments

x

A vector or data frame.

...

For set_labels(), Optional, unquoted names of variables that should be selected for further processing. Required, if x is a data frame (and no vector) and only selected variables from x should be processed. You may also use functions like : or tidyselect's select-helpers.

For val_labels(), pairs of named vectors, where the name equals the variable name, which should be labelled, and the value is the new variable label. val_labels() also supports quasi-quotation (see 'Examples').

labels

(Named) character vector of labels that will be added to x as "labels" or "value.labels" attribute.

  • if labels is not a named vector, its length must equal the value range of x, i.e. if x has values from 1 to 3, labels should have a length of 3;

  • if length of labels is intended to differ from length of unique values of x, a warning is given. You can still add missing labels with the force.labels or force.values arguments; see 'Note'.

  • if labels is a named vector, value labels will be set accordingly, even if x has a different length of unique values. See 'Note' and 'Examples'.

  • if x is a data frame, labels may also be a list of (named) character vectors;

  • if labels is a list, it must have the same length as number of columns of x;

  • if labels is a vector and x is a data frame, labels will be applied to each column of x.

Use labels = "" to remove labels-attribute from x.

force.labels

Logical; if TRUE, all labels are added as value label attribute, even if x has less unique values then length of labels or if x has a smaller range then length of labels. See 'Examples'. This parameter will be ignored, if labels is a named vector.

force.values

Logical, if TRUE (default) and labels has less elements than unique values of x, additional values not covered by labels will be added as label as well. See 'Examples'. This parameter will be ignored, if labels is a named vector.

drop.na

Logical, whether existing value labels of tagged NA values (see tagged_na) should be removed (drop.na = TRUE, the default) or preserved (drop.na = FALSE). See get_na for more details on tagged NA values.

Value

x with value label attributes; or with removed label-attributes if

labels = "". If x is a data frame, the complete data frame x will be returned, with removed or added to variables specified in ...; if ... is not specified, applies to all variables in the data frame.

Note

  • if labels is a named vector, force.labels and force.values will be ignored, and only values defined in labels will be labelled;

  • if x has less unique values than labels, redundant labels will be dropped, see force.labels;

  • if x has more unique values than labels, only matching values will be labelled, other values remain unlabelled, see force.values;

If you only want to change partial value labels, use add_labels instead. Furthermore, see 'Note' in get_labels.

See also

See vignette Labelled Data and the sjlabelled-Package for more details; set_label to manually set variable labels or get_label to get variable labels; add_labels to add additional value labels without replacing the existing ones.

Examples

dummy <- sample(1:4, 40, replace = TRUE)
frq(dummy)
#> x <integer> 
#> # total N=40 valid N=40 mean=2.75 sd=1.10
#> 
#> Value |  N | Raw % | Valid % | Cum. %
#> -------------------------------------
#>     1 |  8 |    20 |      20 |     20
#>     2 |  6 |    15 |      15 |     35
#>     3 | 14 |    35 |      35 |     70
#>     4 | 12 |    30 |      30 |    100
#>  <NA> |  0 |     0 |    <NA> |   <NA>

dummy <- set_labels(dummy, labels = c("very low", "low", "mid", "hi"))
frq(dummy)
#> x <integer> 
#> # total N=40 valid N=40 mean=2.75 sd=1.10
#> 
#> Value |    Label |  N | Raw % | Valid % | Cum. %
#> ------------------------------------------------
#>     1 | very low |  8 |    20 |      20 |     20
#>     2 |      low |  6 |    15 |      15 |     35
#>     3 |      mid | 14 |    35 |      35 |     70
#>     4 |       hi | 12 |    30 |      30 |    100
#>  <NA> |     <NA> |  0 |     0 |    <NA> |   <NA>

# assign labels with named vector
dummy <- sample(1:4, 40, replace = TRUE)
dummy <- set_labels(dummy, labels = c("very low" = 1, "very high" = 4))
frq(dummy)
#> x <integer> 
#> # total N=40 valid N=40 mean=2.48 sd=1.04
#> 
#> Value |     Label |  N | Raw % | Valid % | Cum. %
#> -------------------------------------------------
#>     1 |  very low |  8 | 20.00 |   20.00 |  20.00
#>     2 |         2 | 13 | 32.50 |   32.50 |  52.50
#>     3 |         3 | 11 | 27.50 |   27.50 |  80.00
#>     4 | very high |  8 | 20.00 |   20.00 | 100.00
#>  <NA> |      <NA> |  0 |  0.00 |    <NA> |   <NA>

# force using all labels, even if not all labels
# have associated values in vector
x <- c(2, 2, 3, 3, 2)
# only two value labels
x <- set_labels(x, labels = c("1", "2", "3"))
#> More labels than values of "x". Using first 2 labels.
x
#> [1] 2 2 3 3 2
#> attr(,"labels")
#> 1 2 
#> 2 3 
frq(x)
#> x <numeric> 
#> # total N=5 valid N=5 mean=2.40 sd=0.55
#> 
#> Value | Label | N | Raw % | Valid % | Cum. %
#> --------------------------------------------
#>     2 |     1 | 3 |    60 |      60 |     60
#>     3 |     2 | 2 |    40 |      40 |    100
#>  <NA> |  <NA> | 0 |     0 |    <NA> |   <NA>

# all three value labels
x <- set_labels(x, labels = c("1", "2", "3"), force.labels = TRUE)
x
#> [1] 2 2 3 3 2
#> attr(,"labels")
#> 1 2 3 
#> 1 2 3 
frq(x)
#> x <numeric> 
#> # total N=5 valid N=5 mean=2.40 sd=0.55
#> 
#> Value | Label | N | Raw % | Valid % | Cum. %
#> --------------------------------------------
#>     1 |     1 | 0 |     0 |       0 |      0
#>     2 |     2 | 3 |    60 |      60 |     60
#>     3 |     3 | 2 |    40 |      40 |    100
#>  <NA> |  <NA> | 0 |     0 |    <NA> |   <NA>

# create vector
x <- c(1, 2, 3, 2, 4, NA)
# add less labels than values
x <- set_labels(x, labels = c("yes", "maybe", "no"), force.values = FALSE)
#> "x" has more values than "labels", hence not all values are labelled.
x
#> [1]  1  2  3  2  4 NA
#> attr(,"labels")
#>   yes maybe    no 
#>     1     2     3 
# add all necessary labels
x <- set_labels(x, labels = c("yes", "maybe", "no"), force.values = TRUE)
#> More values in "x" than length of "labels". Additional values were added to labels.
x
#> [1]  1  2  3  2  4 NA
#> attr(,"labels")
#>   yes maybe    no     4 
#>     1     2     3     4 

# set labels and missings
x <- c(1, 1, 1, 2, 2, -2, 3, 3, 3, 3, 3, 9)
x <- set_labels(x, labels = c("Refused", "One", "Two", "Three", "Missing"))
x
#>  [1]  1  1  1  2  2 -2  3  3  3  3  3  9
#> attr(,"labels")
#> Refused     One     Two   Three Missing 
#>      -2       1       2       3       9 
set_na(x, na = c(-2, 9))
#>  [1]  1  1  1  2  2 NA  3  3  3  3  3 NA
#> attr(,"labels")
#>   One   Two Three 
#>     1     2     3 


x <- labelled(
  c(1:3, tagged_na("a", "c", "z"), 4:1),
  c("Agreement" = 1, "Disagreement" = 4, "First" = tagged_na("c"),
    "Refused" = tagged_na("a"), "Not home" = tagged_na("z"))
)
# get current NA values
x
#> <labelled<double>[10]>
#>  [1]     1     2     3 NA(a) NA(c) NA(z)     4     3     2     1
#> 
#> Labels:
#>  value        label
#>      1    Agreement
#>      4 Disagreement
#>  NA(c)        First
#>  NA(a)      Refused
#>  NA(z)     Not home
get_na(x)
#>    First  Refused Not home 
#>       NA       NA       NA 
# lose value labels from tagged NA by default, if not specified
set_labels(x, labels = c("New Three" = 3))
#> <labelled<double>[10]>
#>  [1]     1     2     3 NA(a) NA(c) NA(z)     4     3     2     1
#> 
#> Labels:
#>  value     label
#>      3 New Three
# do not drop na
set_labels(x, labels = c("New Three" = 3), drop.na = FALSE)
#> <labelled<double>[10]>
#>  [1]     1     2     3 NA(a) NA(c) NA(z)     4     3     2     1
#> 
#> Labels:
#>  value     label
#>      3 New Three
#>  NA(c)     First
#>  NA(a)   Refused
#>  NA(z)  Not home


# set labels via named vector,
# not using all possible values
data(efc)
get_labels(efc$e42dep)
#> [1] "independent"          "slightly dependent"   "moderately dependent"
#> [4] "severely dependent"  

x <- set_labels(
  efc$e42dep,
  labels = c(`independent` = 1,
             `severe dependency` = 2,
             `missing value` = 9)
  )
get_labels(x, values = "p")
#> [1] "[1] independent"       "[2] severe dependency" "[9] missing value"    
get_labels(x, values = "p", non.labelled = TRUE)
#> [1] "[1] independent"       "[2] severe dependency" "[3] 3"                
#> [4] "[4] 4"                 "[9] missing value"    

# labels can also be set for tagged NA value
# create numeric vector
x <- c(1, 2, 3, 4)
# set 2 and 3 as missing, which will automatically set as
# tagged NA by 'set_na()'
x <- set_na(x, na = c(2, 3))
x
#> [1]  1 NA NA  4
# set label via named vector just for tagged NA(3)
set_labels(x, labels = c(`New Value` = tagged_na("3")))
#> [1]  1 NA NA  4
#> attr(,"labels")
#> New Value 
#>        NA 

# setting same value labels to multiple vectors
dummies <- data.frame(
  dummy1 = sample(1:4, 40, replace = TRUE),
  dummy2 = sample(1:4, 40, replace = TRUE),
  dummy3 = sample(1:4, 40, replace = TRUE)
)

# and set same value labels for two of three variables
test <- set_labels(
  dummies, dummy1, dummy2,
  labels = c("very low", "low", "mid", "hi")
)
# see result...
get_labels(test)
#> $dummy1
#> [1] "very low" "low"      "mid"      "hi"      
#> 
#> $dummy2
#> [1] "very low" "low"      "mid"      "hi"      
#> 
#> $dummy3
#> NULL
#> 

# using quasi-quotation
if (require("rlang") && require("dplyr")) {
  dummies <- data.frame(
    dummy1 = sample(1:4, 40, replace = TRUE),
    dummy2 = sample(1:4, 40, replace = TRUE),
    dummy3 = sample(1:4, 40, replace = TRUE)
  )

  x1 <- "dummy1"
  x2 <- c("so low", "rather low", "mid", "very hi")

  dummies %>%
    val_labels(
      !!x1 := c("really low", "low", "a bit mid", "hi"),
      dummy3 = !!x2
    ) %>%
    get_labels()

  # ... and named vectors to explicitly set value labels
  x2 <- c("so low" = 4, "rather low" = 3, "mid" = 2, "very hi" = 1)
  dummies %>%
    val_labels(
      !!x1 := c("really low" = 1, "low" = 3, "a bit mid" = 2, "hi" = 4),
      dummy3 = !!x2
    ) %>% get_labels(values = "p")
}
#> $dummy1
#> [1] "[1] really low" "[2] a bit mid"  "[3] low"        "[4] hi"        
#> 
#> $dummy2
#> NULL
#> 
#> $dummy3
#> [1] "[1] very hi"    "[2] mid"        "[3] rather low" "[4] so low"    
#>