I'm interested in taking a column of a data.frame where the values in the column are pipe delimited and creating dummy variables from the pipe-delimited values. For example: Let's say we start with <pre class="prettyprint"><code>df = data.frame(a = c("Ben|Chris|Jim", "Ben|Greg|Jim|", "Jim|Steve|Ben")) > df a 1 Ben|Chris|Jim 2 Ben|Greg|Jim 3 Jim|Steve|Ben </code></pre> I'm interested in ending up with: <pre class="prettyprint"><code>df2 = data.frame(Ben = c(1, 1, 1), Chris = c(1, 0, 0), Jim = c(1, 1, 1), Greg = c(0, 1, 0), Steve = c(0, 0, 1)) > df2 Ben Chris Jim Greg Steve 1 1 1 1 0 0 2 1 0 1 1 0 3 1 0 1 0 1 </code></pre> I don't know in advance how many potential values there are within the field. In the example above, the variable "a" can include 1 value or 10 values. Assume it is a reasonable number (i.e., < 100 possible values). Any good ways to do this?

Another way is using <code>cSplit_e</code> from <code>splitstackshape</code> package. splitting the dataframe by column <code>a</code> and <code>fill</code> it by 0 and <code>drop</code> the original column. <pre class="prettyprint"><code>library(splitstackshape) cSplit_e(df, "a", "|", type = "character", fill = 0, drop = T) # a_Ben a_Chris a_Greg a_Jim a_Steve #1 1 1 0 1 0 #2 1 0 1 1 0 #3 1 0 0 1 1 </code></pre>

Here is a method in base R <pre class="prettyprint"><code># get unique set of names myNames <- unique(unlist(strsplit(as.character(df$a), split="\\|"))) # get indicator data.frame setNames(data.frame(lapply(myNames, function(i) as.integer(grepl(i, df$a)))), myNames) </code></pre> which returns <pre class="prettyprint"><code>Ben Chris Jim Greg Steve 1 1 1 1 0 0 2 1 0 1 1 0 3 1 0 1 0 1 </code></pre> The first line uses <code>strsplit</code> to produce a list of names split on the pipe "|", <code>unlist</code> and <code>unique</code> produce a vector of unique names. The second line runs through these names with <code>lapply</code>, and uses <code>grepl</code> to search for the names, which <code>as.integer</code> converts into binary integers. The returned list is converted into a data.frame and given column names with <code>setNames</code>.

Here is one option using <code>dplyr</code> and <code>tidyr</code>: <pre class="prettyprint"><code>library(dplyr) library(tidyr) df %>% tibble::rownames_to_column(var = "id") %>% mutate(a = strsplit(as.character(a), "\\|")) %>% unnest() %>% table() # a # id Ben Chris Greg Jim Steve # 1 1 1 0 1 0 # 2 1 0 1 1 0 # 3 1 0 0 1 1 </code></pre> The analogue in base R is: <pre class="prettyprint"><code>df$a <- as.character(df$a) s <- strsplit(df$a, "|", fixed=TRUE) table(id = rep(1:nrow(df), lengths(s)), v = unlist(s)) </code></pre> Data: <pre class="prettyprint"><code>df = data.frame(a = c("Ben|Chris|Jim", "Ben|Greg|Jim", "Jim|Steve|Ben")) </code></pre>

Convert column with pipe delimited data into dummy variables [duplicate]

Q: How do I convert an Excel file to a pipe delimited file?

This article will demonstrate how to convert an Excel file to a pipe delimited text file. 1. In the Control Panel of your computer, adjust the view to View by Large Icons, and then select Region. 2. Select Additional Settings.

Q: What is the conversion of categorical variables into dummy variables?

The conversion of Categorical Variables into Dummy Variables leads to the formation of the two-dimensional binary matrix where each column represents a particular category. The following example will further clarify the process of conversion. The above data set comprises four categorical columns: OUTLOOK, TEMPERATURE, HUMIDITY, WINDY.

Q: How do you compare two columns with different dummy variables?

So, in the data set that contains the Dummy Variables, the column WINDY is replaced by two columns which each represent the categories: YES and NO. Now comparing the rows of the columns YES and NO with WINDY, we mark 0 for YES where it is absent and 1 where it is present.

Q: How do I convert a categorical column to a data frame?

Using this approach, we use LabelBinarizer from sklearn which converts one categorical column to a data frame with dummy variables at a time. This data frame can then be appended to the main data frame in the case of there being more than one Categorical column.

I'm interested in taking a column of a data.frame where the values in the column are pipe delimited and creating dummy variables from the pipe-delimited values.

For example:

Let's say we start with

df = data.frame(a = c("Ben|Chris|Jim", "Ben|Greg|Jim|", "Jim|Steve|Ben"))

> df
              a
1 Ben|Chris|Jim
2 Ben|Greg|Jim
3 Jim|Steve|Ben

I'm interested in ending up with:

df2 = data.frame(Ben = c(1, 1, 1), Chris = c(1, 0, 0), Jim = c(1, 1, 1), Greg = c(0, 1, 0), 
                 Steve = c(0, 0, 1))
> df2
  Ben Chris Jim Greg Steve
1   1     1   1    0     0
2   1     0   1    1     0
3   1     0   1    0     1

I don't know in advance how many potential values there are within the field. In the example above, the variable "a" can include 1 value or 10 values. Assume it is a reasonable number (i.e., < 100 possible values).

Any good ways to do this?

How do I convert an Excel file to a pipe delimited file?

This article will demonstrate how to convert an Excel file to a pipe delimited text file. 1. In the Control Panel of your computer, adjust the view to View by Large Icons, and then select Region. 2. Select Additional Settings.

What is the conversion of categorical variables into dummy variables?

The conversion of Categorical Variables into Dummy Variables leads to the formation of the two-dimensional binary matrix where each column represents a particular category. The following example will further clarify the process of conversion. The above data set comprises four categorical columns: OUTLOOK, TEMPERATURE, HUMIDITY, WINDY.

How do you compare two columns with different dummy variables?

So, in the data set that contains the Dummy Variables, the column WINDY is replaced by two columns which each represent the categories: YES and NO. Now comparing the rows of the columns YES and NO with WINDY, we mark 0 for YES where it is absent and 1 where it is present.

How do I convert a categorical column to a data frame?

Using this approach, we use LabelBinarizer from sklearn which converts one categorical column to a data frame with dummy variables at a time. This data frame can then be appended to the main data frame in the case of there being more than one Categorical column.

Another way is using cSplit_e from splitstackshape package.

splitting the dataframe by column a and fill it by 0 and drop the original column.

library(splitstackshape)
cSplit_e(df, "a", "|", type = "character", fill = 0, drop = T)

#   a_Ben a_Chris a_Greg a_Jim a_Steve
#1     1       1      0     1       0
#2     1       0      1     1       0
#3     1       0      0     1       1

Here is a method in base R

# get unique set of names
myNames <- unique(unlist(strsplit(as.character(df$a), split="\\|")))
# get indicator data.frame
setNames(data.frame(lapply(myNames, function(i) as.integer(grepl(i, df$a)))), myNames)

which returns

Ben Chris Jim Greg Steve
1   1     1   1    0     0
2   1     0   1    1     0
3   1     0   1    0     1

The first line uses strsplit to produce a list of names split on the pipe "|", unlist and unique produce a vector of unique names. The second line runs through these names with lapply, and uses grepl to search for the names, which as.integer converts into binary integers. The returned list is converted into a data.frame and given column names with setNames.

Here is one option using dplyr and tidyr:

library(dplyr)
library(tidyr)
df %>% tibble::rownames_to_column(var = "id") %>% 
       mutate(a = strsplit(as.character(a), "\\|")) %>% 
       unnest() %>% table()

#    a
# id  Ben Chris Greg Jim Steve
#  1   1     1    0   1     0
#  2   1     0    1   1     0
#  3   1     0    0   1     1

The analogue in base R is:

df$a <- as.character(df$a)
s    <- strsplit(df$a, "|", fixed=TRUE)
table(id = rep(1:nrow(df), lengths(s)), v = unlist(s))

Data:

df = data.frame(a = c("Ben|Chris|Jim", "Ben|Greg|Jim", "Jim|Steve|Ben"))

Convert column with pipe delimited data into dummy variables [duplicate]

Tags:

r

delimiter

dreww2

People also ask

3 Answers

Ronak Shah

lmo

Psidom

Recent Activity

Donate For Us

Convert column with pipe delimited data into dummy variables [duplicate]

Tags:

r

delimiter

dreww2

People also ask

3 Answers

Ronak Shah

lmo

Psidom

Related questions

Recent Activity

Donate For Us