This function converts wide data into long format. It allows to transform multiple key-value pairs to be transformed from wide to long format in one single step.

to_long(data, keys, values, ..., labels = NULL, recode.key = FALSE)

Arguments

data

A data.frame that should be tansformed from wide to long format.

keys

Character vector with name(s) of key column(s) to create in output. Either one key value per column group that should be gathered, or a single string. In the latter case, this name will be used as key column, and only one key column is created. See 'Examples'.

values

Character vector with names of value columns (variable names) to create in output. Must be of same length as number of column groups that should be gathered. See 'Examples'.

...

Specification of columns that should be gathered. Must be one character vector with variable names per column group, or a numeric vector with column indices indicating those columns that should be gathered. See 'Examples'.

labels

Character vector of same length as values with variable labels for the new variables created from gathered columns. See 'Examples' and 'Details'.

recode.key

Logical, if TRUE, the values of the key column will be recoded to numeric values, in sequential ascending order.

Details

This function reshapes data from wide to long format, however, you can gather multiple column groups at once. Value and variable labels for non-gathered variables are preserved. Attributes from gathered variables, such as information about the variable labels, are lost during reshaping. Hence, the new created variables from gathered columns don't have any variable label attributes. In such cases, use labels argument to set back variable label attributes.

See also

Examples

# create sample
mydat <- data.frame(age = c(20, 30, 40),
                    sex = c("Female", "Male", "Male"),
                    score_t1 = c(30, 35, 32),
                    score_t2 = c(33, 34, 37),
                    score_t3 = c(36, 35, 38),
                    speed_t1 = c(2, 3, 1),
                    speed_t2 = c(3, 4, 5),
                    speed_t3 = c(1, 8, 6))

# gather multiple columns. both time and speed are gathered.
to_long(
  data = mydat,
  keys = "time",
  values = c("score", "speed"),
  c("score_t1", "score_t2", "score_t3"),
  c("speed_t1", "speed_t2", "speed_t3")
)
#>   age    sex     time score speed
#> 1  20 Female score_t1    30     2
#> 2  30   Male score_t1    35     3
#> 3  40   Male score_t1    32     1
#> 4  20 Female score_t2    33     3
#> 5  30   Male score_t2    34     4
#> 6  40   Male score_t2    37     5
#> 7  20 Female score_t3    36     1
#> 8  30   Male score_t3    35     8
#> 9  40   Male score_t3    38     6

# alternative syntax, using "reshape_longer()"
reshape_longer(
  mydat,
  columns = list(
    c("score_t1", "score_t2", "score_t3"),
    c("speed_t1", "speed_t2", "speed_t3")
  ),
  names.to = "time",
  values.to = c("score", "speed")
)
#>   age    sex     time score speed .id
#> 1  20 Female score_t1    30     2   1
#> 2  30   Male score_t1    35     3   2
#> 3  40   Male score_t1    32     1   3
#> 4  20 Female score_t2    33     3   1
#> 5  30   Male score_t2    34     4   2
#> 6  40   Male score_t2    37     5   3
#> 7  20 Female score_t3    36     1   1
#> 8  30   Male score_t3    35     8   2
#> 9  40   Male score_t3    38     6   3

# or ...
reshape_longer(
  mydat,
  list(3:5, 6:8),
  names.to = "time",
  values.to = c("score", "speed")
)
#>   age    sex     time score speed .id
#> 1  20 Female score_t1    30     2   1
#> 2  30   Male score_t1    35     3   2
#> 3  40   Male score_t1    32     1   3
#> 4  20 Female score_t2    33     3   1
#> 5  30   Male score_t2    34     4   2
#> 6  40   Male score_t2    37     5   3
#> 7  20 Female score_t3    36     1   1
#> 8  30   Male score_t3    35     8   2
#> 9  40   Male score_t3    38     6   3

# gather multiple columns, use numeric key-value
to_long(
  data = mydat,
  keys = "time",
  values = c("score", "speed"),
  c("score_t1", "score_t2", "score_t3"),
  c("speed_t1", "speed_t2", "speed_t3"),
  recode.key = TRUE
)
#>   age    sex time score speed
#> 1  20 Female    1    30     2
#> 2  30   Male    1    35     3
#> 3  40   Male    1    32     1
#> 4  20 Female    2    33     3
#> 5  30   Male    2    34     4
#> 6  40   Male    2    37     5
#> 7  20 Female    3    36     1
#> 8  30   Male    3    35     8
#> 9  40   Male    3    38     6

# gather multiple columns by colum names and colum indices
to_long(
  data = mydat,
  keys = "time",
  values = c("score", "speed"),
  c("score_t1", "score_t2", "score_t3"),
  6:8,
  recode.key = TRUE
)
#>   age    sex time score speed
#> 1  20 Female    1    30     2
#> 2  30   Male    1    35     3
#> 3  40   Male    1    32     1
#> 4  20 Female    2    33     3
#> 5  30   Male    2    34     4
#> 6  40   Male    2    37     5
#> 7  20 Female    3    36     1
#> 8  30   Male    3    35     8
#> 9  40   Male    3    38     6

# gather multiple columns, use separate key-columns
# for each value-vector
to_long(
  data = mydat,
  keys = c("time_score", "time_speed"),
  values = c("score", "speed"),
  c("score_t1", "score_t2", "score_t3"),
  c("speed_t1", "speed_t2", "speed_t3")
)
#>   age    sex time_score score time_speed speed
#> 1  20 Female   score_t1    30   speed_t1     2
#> 2  30   Male   score_t1    35   speed_t1     3
#> 3  40   Male   score_t1    32   speed_t1     1
#> 4  20 Female   score_t2    33   speed_t2     3
#> 5  30   Male   score_t2    34   speed_t2     4
#> 6  40   Male   score_t2    37   speed_t2     5
#> 7  20 Female   score_t3    36   speed_t3     1
#> 8  30   Male   score_t3    35   speed_t3     8
#> 9  40   Male   score_t3    38   speed_t3     6

# gather multiple columns, label columns
mydat <- to_long(
  data = mydat,
  keys = "time",
  values = c("score", "speed"),
  c("score_t1", "score_t2", "score_t3"),
  c("speed_t1", "speed_t2", "speed_t3"),
  labels = c("Test Score", "Time needed to finish")
)

library(sjlabelled)
str(mydat$score)
#>  num [1:9] 30 35 32 33 34 37 36 35 38
#>  - attr(*, "label")= chr "Test Score"
get_label(mydat$speed)
#> [1] "Time needed to finish"