Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a new column with non-null columns' names

Tags:

r

data.table

My data set looks like this one:

library(data.table)

df <- data.table(a = c(1,2,3,4,5),
                 b = c(1,0,2,5,1),
                 c = c(0,1,1,0,0),
                 d = c(1,0,0,2,2))

df
#    a b c d
# 1: 1 1 0 1
# 2: 2 0 1 0
# 3: 3 2 1 0
# 4: 4 5 0 2
# 5: 5 1 0 2

I want to create a new column with non-null columns names. The result will be:

df_result <- data.table(a = c(1,2,3,4,5),
                        z = c('b_d', 'c', 'b_c', 'b_d', 'b_d'))

df_result
#    a   z
# 1: 1 b_d
# 2: 2   c
# 3: 3 b_c
# 4: 4 b_d
# 5: 5 b_d
like image 223
Vitaliy Radchenko Avatar asked Mar 30 '16 09:03

Vitaliy Radchenko


People also ask

How do I insert a NULL value in a NOT NULL column?

Code Inspection: Insert NULL into NOT NULL column You cannot insert NULL values in col1 and col2 because they are defined as NOT NULL. If you run the script as is, you will receive an error. To fix this code, replace NULL in the VALUES part with some values (for example, 42 and 'bird' ).

How do you add not NULL constraint in SQL using alter command?

To enforce NOT NULL for a column in SQL Server, use the ALTER TABLE .. ALTER COLUMN command and restate the column definition, adding the NOT NULL attribute.

How to add a NOT NULL column in a table?

Before starting let do the same in SQL: 1 Step 1: Adding a not null column in the table.#N#ALTER TABLE employee ADD mobile_number varchar (255) not null; 2 Step 2: When you create a NOT NULL column then you can not insert NULL value on that column. See the error. 3 Step 3: Add the Not Null value into the column. More ...

Is there a way to make a column not nullable?

Alter the table to add the column as NULLable 2. SQL to update the new column in existing records 3. Alter the table again to add the NOT NULL constraint. Absolutely. Another possible option is to create the column as NOT NULL but with a default. Then when you are done you can drop the default. This method is somewhat limited though.

Can a column have null values without a default value?

If you add a column with a default that allows NULLs it can just have NULL in any existing rows. However when you add a column that doesn’t allow NULLs then you have to have a value to put in it. In fact that brings up the point that you can’t add a NOT NULL column without a default if there are any rows in the table.

How to create a Dataframe with specific column names and no rows?

The following code shows how to create a pandas DataFrame with specific column names and no rows: import pandas as pd #create DataFrame df = pd.DataFrame(columns= ['A', 'B', 'C', 'D', 'E']) #view DataFrame df A B C D E


2 Answers

Assuming nrow >> ncol, you could work columnwise

ff = function(x)
{
    ans = character(nrow(x))
    for(j in seq_along(x)) {
        i = x[[j]] > 0L
        ans[i] = paste(ans[i], names(x)[[j]], sep = "_")
    }
    return(gsub("^_", "", ans))
}
ff(df[, -1L, with = FALSE]) #or, `df[, ff(.SD), .SDcols = -1L]` from David Arenburg
#[1] "b_d" "c"   "b_c" "b_d" "b_d"
like image 76
alexis_laz Avatar answered Sep 22 '22 04:09

alexis_laz


One option would be to convert the format from 'wide' to 'long' using melt. Grouped by 'a', we paste the 'variable' elements that corresponds to non-zero elements in 'value' (provided as logical condition in 'i').

melt(df, id.var='a')[value!=0, 
      .(z=paste(variable, collapse="_")), keyby =a]
#   a   z
#1: 1 b_d
#2: 2   c
#3: 3 b_c
#4: 4 b_d
#5: 5 b_d

Or instead of melting, we can group by 'a', unlist the Subset of Data.table (.SD) and paste the names of the columns that corresponds to non-zero elements ('i1').

df[, {i1 <- !!unlist(.SD)
       paste(names(.SD)[i1], collapse="_")} , by= a]

Benchmarks

set.seed(24)
df1 <- data.table(a=1:1e6, b = sample(0:5, 1e6, 
   replace=TRUE), c = sample(0:4, 1e6, replace=TRUE), 
    d = sample(0:3, 1e6, replace=TRUE))

akrun1 <- function() {
   melt(df1, id.var='a')[value!=0, 
      .(z=paste(variable, collapse="_")), keyby =a]
    }

 akrun2 <- function() {
   df1[, {i1 <- !!unlist(.SD)
       paste(names(.SD)[i1], collapse="_")} , by= a]
   }

 ronak <- function() {
    data.table(z = lapply(apply(df1, 1, function(x)
                which(x[-1]!= 0)), 
       function(x) paste0(names(x), collapse = "_")))
   }

eddi <- function(){
 df1[, newcol := gsub("NA_|_NA|NA", "",                          
   do.call(function(...) paste(..., sep = "_"),            
     Map(function(x, y) x[(y == 0) + 1], names(.SD), .SD)))
 , .SDcols = b:d]

 }

alexis = function(x)
   {
   ans = character(nrow(x))
   for(j in seq_along(x)) {
    i = x[[j]] > 0L
    ans[i] = paste(ans[i], names(x)[[j]], sep = "_")
   }
  return(gsub("^_", "", ans))
}





system.time(akrun1())
#   user  system elapsed 
#  22.04    0.15   22.36 
 system.time(akrun2())
#   user  system elapsed 
# 26.33    0.00   26.41 
 system.time(ronak())
#   user  system elapsed 
#  25.60    0.26   25.96 


system.time(alexis(df1[, -1L, with = FALSE]))
#   user  system elapsed 
#   1.92    0.06    2.09 

system.time(eddi())
#  user  system elapsed 
#   2.41    0.06    3.19 
like image 33
akrun Avatar answered Sep 26 '22 04:09

akrun