vignettes/quasiquotation.Rmd
quasiquotation.Rmd
Labelling data is typically a task for end-users and is applied in own scripts or functions rather than in packages. However, sometimes it can be useful for both end-users and package developers to have a flexible way to add variable and value labels to their data. In such cases, quasiquotation is helpful.
This vignette demonstrate how to use quasiquotation in sjlabelled to label your data.
Usually, set_labels()
can be used to add value labels to
variables. The syntax of this function is easy to use, and
set_labels()
allows to add value labels to multiple
variables at once, if these variables share the same value labels.
In the following examples, we will use the frq()
function, that shows an extra label-column containing
value labels, if the data is labelled. If the data has
no value labels, this column is not shown in the output.
library(sjlabelled)
library(sjmisc) # for frq()-function
library(rlang)
# unlabelled data
dummies <- data.frame(
dummy1 = sample(1:3, 40, replace = TRUE),
dummy2 = sample(1:3, 40, replace = TRUE),
dummy3 = sample(1:3, 40, replace = TRUE)
)
# set labels for all variables in the data frame
test <- set_labels(dummies, labels = c("low", "mid", "hi"))
attr(test$dummy1, "labels")
#> low mid hi
#> 1 2 3
frq(test, dummy1)
#> dummy1 <integer>
#> # total N=40 valid N=40 mean=2.08 sd=0.83
#>
#> Value | Label | N | Raw % | Valid % | Cum. %
#> ---------------------------------------------
#> 1 | low | 12 | 30.00 | 30.00 | 30.00
#> 2 | mid | 13 | 32.50 | 32.50 | 62.50
#> 3 | hi | 15 | 37.50 | 37.50 | 100.00
#> <NA> | <NA> | 0 | 0.00 | <NA> | <NA>
# and set same value labels for two of three variables
test <- set_labels(
dummies, dummy1, dummy2,
labels = c("low", "mid", "hi")
)
frq(test)
#> dummy1 <integer>
#> # total N=40 valid N=40 mean=2.08 sd=0.83
#>
#> Value | Label | N | Raw % | Valid % | Cum. %
#> ---------------------------------------------
#> 1 | low | 12 | 30.00 | 30.00 | 30.00
#> 2 | mid | 13 | 32.50 | 32.50 | 62.50
#> 3 | hi | 15 | 37.50 | 37.50 | 100.00
#> <NA> | <NA> | 0 | 0.00 | <NA> | <NA>
#>
#> dummy2 <integer>
#> # total N=40 valid N=40 mean=2.02 sd=0.77
#>
#> Value | Label | N | Raw % | Valid % | Cum. %
#> ---------------------------------------------
#> 1 | low | 11 | 27.50 | 27.50 | 27.50
#> 2 | mid | 17 | 42.50 | 42.50 | 70.00
#> 3 | hi | 12 | 30.00 | 30.00 | 100.00
#> <NA> | <NA> | 0 | 0.00 | <NA> | <NA>
#>
#> dummy3 <integer>
#> # total N=40 valid N=40 mean=1.80 sd=0.85
#>
#> Value | N | Raw % | Valid % | Cum. %
#> -------------------------------------
#> 1 | 19 | 47.50 | 47.50 | 47.50
#> 2 | 10 | 25.00 | 25.00 | 72.50
#> 3 | 11 | 27.50 | 27.50 | 100.00
#> <NA> | 0 | 0.00 | <NA> | <NA>
val_labels()
does the same job as
set_labels()
, but in a different way. While
set_labels()
requires variables to be specified in the
...
-argument, and labels in the
labels
-argument, val_labels()
requires both to
be specified in the ...
.
val_labels()
requires named vectors as
argument, with the left-hand side being the name of the
variable that should be labelled, and the right-hand side
containing the labels for the values.
test <- val_labels(dummies, dummy1 = c("low", "mid", "hi"))
attr(test$dummy1, "labels")
#> low mid hi
#> 1 2 3
# remaining variables are not labelled
frq(test)
#> dummy1 <integer>
#> # total N=40 valid N=40 mean=2.08 sd=0.83
#>
#> Value | Label | N | Raw % | Valid % | Cum. %
#> ---------------------------------------------
#> 1 | low | 12 | 30.00 | 30.00 | 30.00
#> 2 | mid | 13 | 32.50 | 32.50 | 62.50
#> 3 | hi | 15 | 37.50 | 37.50 | 100.00
#> <NA> | <NA> | 0 | 0.00 | <NA> | <NA>
#>
#> dummy2 <integer>
#> # total N=40 valid N=40 mean=2.02 sd=0.77
#>
#> Value | N | Raw % | Valid % | Cum. %
#> -------------------------------------
#> 1 | 11 | 27.50 | 27.50 | 27.50
#> 2 | 17 | 42.50 | 42.50 | 70.00
#> 3 | 12 | 30.00 | 30.00 | 100.00
#> <NA> | 0 | 0.00 | <NA> | <NA>
#>
#> dummy3 <integer>
#> # total N=40 valid N=40 mean=1.80 sd=0.85
#>
#> Value | N | Raw % | Valid % | Cum. %
#> -------------------------------------
#> 1 | 19 | 47.50 | 47.50 | 47.50
#> 2 | 10 | 25.00 | 25.00 | 72.50
#> 3 | 11 | 27.50 | 27.50 | 100.00
#> <NA> | 0 | 0.00 | <NA> | <NA>
Unlike set_labels()
, val_labels()
allows
the user to add different value labels to different variables
in one function call. Another advantage, or difference, of
val_labels()
is it’s flexibility in defining variable names
and value labels by using quasiquotation.
To use quasiquotation, we need the rlang package to
be installed and loaded. Now we can have labels in a character vector,
and use !!
to unquote this vector.
labels <- c("low_quote", "mid_quote", "hi_quote")
test <- val_labels(dummies, dummy1 = !! labels)
attr(test$dummy1, "labels")
#> low_quote mid_quote hi_quote
#> 1 2 3
The same can be done with the names of variables that should
get new value labels. We then need !!
to unquote the
variable name and :=
as assignment.
variable <- "dummy2"
test <- val_labels(dummies, !! variable := c("lo_var", "mid_var", "high_var"))
# no value labels
attr(test$dummy1, "labels")
#> NULL
# value labels
attr(test$dummy2, "labels")
#> lo_var mid_var high_var
#> 1 2 3
Finally, we can combine the above approaches to be flexible regarding both variable names and value labels.
variable <- "dummy3"
labels <- c("low", "mid", "hi")
test <- val_labels(dummies, !! variable := !! labels)
attr(test$dummy3, "labels")
#> low mid hi
#> 1 2 3
set_label()
is the equivalent to
set_labels()
to add variable labels to a variable. The
equivalent to val_labels()
is var_labels()
,
which works in the same way as val_labels()
. In case of
variable labels, a label
-attribute is added to a
vector or factor (instead of a labels
-attribute, which is
used for value labels).
The following examples show how to use var_labels()
to
add variable labels to the data. We demonstrate this function without
further explanation, because it is actually very similar to
val_labels()
.
dummy <- data.frame(
a = sample(1:4, 10, replace = TRUE),
b = sample(1:4, 10, replace = TRUE),
c = sample(1:4, 10, replace = TRUE)
)
# simple usage
test <- var_labels(dummy, a = "first variable", c = "third variable")
attr(test$a, "label")
#> [1] "first variable"
attr(test$b, "label")
#> NULL
attr(test$c, "label")
#> [1] "third variable"
# quasiquotation for labels
v1 <- "First variable"
v2 <- "Second variable"
test <- var_labels(dummy, a = !! v1, b = !! v2)
attr(test$a, "label")
#> [1] "First variable"
attr(test$b, "label")
#> [1] "Second variable"
attr(test$c, "label")
#> NULL
# quasiquotation for variable names
x1 <- "a"
x2 <- "c"
test <- var_labels(dummy, !! x1 := "First", !! x2 := "Second")
attr(test$a, "label")
#> [1] "First"
attr(test$b, "label")
#> NULL
attr(test$c, "label")
#> [1] "Second"
# quasiquotation for both variable names and labels
test <- var_labels(dummy, !! x1 := !! v1, !! x2 := !! v2)
attr(test$a, "label")
#> [1] "First variable"
attr(test$b, "label")
#> NULL
attr(test$c, "label")
#> [1] "Second variable"
As we have demonstrated, var_labels()
and
val_labels()
are one of the most flexible and easy-to-use
ways to add value and variable labels to our data. Another advantage is
the consistent design of all functions in sjlabelled,
which allows seamless integration into pipe-workflows.