My data set looks like this one:
library(data.table)
df <- data.table(a = c(1,2,3,4,5),
b = c(1,0,2,5,1),
c = c(0,1,1,0,0),
d = c(1,0,0,2,2))
df
# a b c d
# 1: 1 1 0 1
# 2: 2 0 1 0
# 3: 3 2 1 0
# 4: 4 5 0 2
# 5: 5 1 0 2
I want to create a new column with non-null columns names. The result will be:
df_result <- data.table(a = c(1,2,3,4,5),
z = c('b_d', 'c', 'b_c', 'b_d', 'b_d'))
df_result
# a z
# 1: 1 b_d
# 2: 2 c
# 3: 3 b_c
# 4: 4 b_d
# 5: 5 b_d
Code Inspection: Insert NULL into NOT NULL column You cannot insert NULL values in col1 and col2 because they are defined as NOT NULL. If you run the script as is, you will receive an error. To fix this code, replace NULL in the VALUES part with some values (for example, 42 and 'bird' ).
To enforce NOT NULL for a column in SQL Server, use the ALTER TABLE .. ALTER COLUMN command and restate the column definition, adding the NOT NULL attribute.
Before starting let do the same in SQL: 1 Step 1: Adding a not null column in the table.#N#ALTER TABLE employee ADD mobile_number varchar (255) not null; 2 Step 2: When you create a NOT NULL column then you can not insert NULL value on that column. See the error. 3 Step 3: Add the Not Null value into the column. More ...
Alter the table to add the column as NULLable 2. SQL to update the new column in existing records 3. Alter the table again to add the NOT NULL constraint. Absolutely. Another possible option is to create the column as NOT NULL but with a default. Then when you are done you can drop the default. This method is somewhat limited though.
If you add a column with a default that allows NULLs it can just have NULL in any existing rows. However when you add a column that doesn’t allow NULLs then you have to have a value to put in it. In fact that brings up the point that you can’t add a NOT NULL column without a default if there are any rows in the table.
The following code shows how to create a pandas DataFrame with specific column names and no rows: import pandas as pd #create DataFrame df = pd.DataFrame(columns= ['A', 'B', 'C', 'D', 'E']) #view DataFrame df A B C D E
Assuming nrow >> ncol
, you could work columnwise
ff = function(x)
{
ans = character(nrow(x))
for(j in seq_along(x)) {
i = x[[j]] > 0L
ans[i] = paste(ans[i], names(x)[[j]], sep = "_")
}
return(gsub("^_", "", ans))
}
ff(df[, -1L, with = FALSE]) #or, `df[, ff(.SD), .SDcols = -1L]` from David Arenburg
#[1] "b_d" "c" "b_c" "b_d" "b_d"
One option would be to convert the format from 'wide' to 'long' using melt
. Grouped by 'a', we paste
the 'variable' elements that corresponds to non-zero elements in 'value' (provided as logical condition in 'i').
melt(df, id.var='a')[value!=0,
.(z=paste(variable, collapse="_")), keyby =a]
# a z
#1: 1 b_d
#2: 2 c
#3: 3 b_c
#4: 4 b_d
#5: 5 b_d
Or instead of melt
ing, we can group by 'a', unlist
the Subset of Data.table (.SD
) and paste
the names
of the columns that corresponds to non-zero elements ('i1').
df[, {i1 <- !!unlist(.SD)
paste(names(.SD)[i1], collapse="_")} , by= a]
set.seed(24)
df1 <- data.table(a=1:1e6, b = sample(0:5, 1e6,
replace=TRUE), c = sample(0:4, 1e6, replace=TRUE),
d = sample(0:3, 1e6, replace=TRUE))
akrun1 <- function() {
melt(df1, id.var='a')[value!=0,
.(z=paste(variable, collapse="_")), keyby =a]
}
akrun2 <- function() {
df1[, {i1 <- !!unlist(.SD)
paste(names(.SD)[i1], collapse="_")} , by= a]
}
ronak <- function() {
data.table(z = lapply(apply(df1, 1, function(x)
which(x[-1]!= 0)),
function(x) paste0(names(x), collapse = "_")))
}
eddi <- function(){
df1[, newcol := gsub("NA_|_NA|NA", "",
do.call(function(...) paste(..., sep = "_"),
Map(function(x, y) x[(y == 0) + 1], names(.SD), .SD)))
, .SDcols = b:d]
}
alexis = function(x)
{
ans = character(nrow(x))
for(j in seq_along(x)) {
i = x[[j]] > 0L
ans[i] = paste(ans[i], names(x)[[j]], sep = "_")
}
return(gsub("^_", "", ans))
}
system.time(akrun1())
# user system elapsed
# 22.04 0.15 22.36
system.time(akrun2())
# user system elapsed
# 26.33 0.00 26.41
system.time(ronak())
# user system elapsed
# 25.60 0.26 25.96
system.time(alexis(df1[, -1L, with = FALSE]))
# user system elapsed
# 1.92 0.06 2.09
system.time(eddi())
# user system elapsed
# 2.41 0.06 3.19
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With