I've been trying to figure this out for a while, and thought I would ask here.
Say I have a data frame like the following:
df <- data.frame(participant = 1:6, group = c("adult", "adult", "child", "child", "NSS", "NSS"), RegProto = c(2, 3, 4, 2, 4, 3), RegInt = c(2, 3, 4, 6, 6, 5), RegDistant = c(3, 3, 4, 5, 4, 5), IrregProto = c(4, 5, 3, 4, 3, 1), IrregInt = c(4, 4, 4, 4, 4, 4), IrregDistant = c(4, 5, 6, 8, 9, 1))
The problem with this data frame is that each contains two variables: one variable whose values are either Reg
or Irreg
, another whose values are Proto
, Int
, or Distant
. What I would like to do is split these columns and make the table long, preferably using tidyr
. I thought I could do it like this.
library("tidyr")
df_long <- df %>%
gather(index, n, -group, -participant) %>%
select(participant, group, index, n) %>%
separate(index, into = c("verb", "similarity"), sep = "\\.?=\\p{Upper}")
This does what I want until separate()
. I get an error message saying that the values were not split, but no other suggestions as to why that might be. I'm new to regex, so I suspect the problem must be there, but I can't figure out what the correct syntax might be.
You can use this regex:
(?<=.)(?=[A-Z])
This indicates the (zero-length) position followed by an uppercase letter and preceded by any character.
The command:
library(dplyr)
df %>%
gather(index, n, -group, -participant) %>%
select(participant, group, index, n) %>%
separate(index, into = c("verb", "similarity"), sep = "(?<=.)(?=[A-Z])")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With