I have such a data frame: <pre class="prettyprint"><code>df <- structure(list(a = c(NA, NA, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L), b = c(NA, NA, NA, 1L, 2L, 3L, 4L, 5L, 6L, 7L), d = c(NA, NA, NA, NA, 1L, 2L, 3L, 4L, 5L, 6L)), .Names = c("a", "b", "d"), row.names = c(NA, -10L), class = "data.frame") > df a b d 1 NA NA NA 2 NA NA NA 3 1 NA NA 4 2 1 NA 5 3 2 1 6 4 3 2 7 5 4 3 8 6 5 4 9 7 6 5 10 8 7 6 </code></pre> In each column, I'd like to move the non-<code>NA</code> values up to the start, and move the <code>NA</code>s to the end: <pre class="prettyprint"><code>> df.out a b d 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 NA 8 8 NA NA 9 NA NA NA 10 NA NA NA </code></pre> Update to make my questions clearer.. <pre class="prettyprint"><code>df <- structure(list(a = c(NA, NA, 1, 5, 34, 7, 3, 5, 8, 4), b = c(NA, NA, NA, 57, 2, 7, 9, 5, 12, 100), d = c(NA, NA, NA, NA, 5, 7, 2, 8, 2, 5)), .Names = c("a", "b", "d"), row.names = c(NA, -10L ), class = "data.frame") > df a b d 1 NA NA NA 2 NA NA NA 3 1 NA NA 4 5 57 NA 5 34 2 5 6 7 7 7 7 3 9 2 8 5 5 8 9 8 12 2 10 4 100 5 </code></pre> should result in: <pre class="prettyprint"><code> a b d 1 1 57 5 2 5 2 7 3 34 7 2 4 7 9 8 5 3 5 2 6 5 12 5 7 8 100 NA 8 4 NA NA 9 NA NA NA 10 NA NA NA </code></pre> Seems like an easy task but I am stuck on where to start.. Can you help?

Another solution using <code>lapply</code> (without sorting/reordering the data- per your comments) <pre class="prettyprint"><code>df[] <- lapply(df, function(x) c(x[!is.na(x)], x[is.na(x)])) df # a b d # 1 1 57 5 # 2 5 2 7 # 3 34 7 2 # 4 7 9 8 # 5 3 5 2 # 6 5 12 5 # 7 8 100 NA # 8 4 NA NA # 9 NA NA NA # 10 NA NA NA </code></pre> Or using <code>data.table</code> in order to update <code>df</code> by reference, rather than creating a copy of it (that solution won't sort your data neither) <pre class="prettyprint"><code>library(data.table) setDT(df)[, names(df) := lapply(.SD, function(x) c(x[!is.na(x)], x[is.na(x)]))] df # a b d # 1: 1 57 5 # 2: 5 2 7 # 3: 34 7 2 # 4: 7 9 8 # 5: 3 5 2 # 6: 5 12 5 # 7: 8 100 NA # 8: 4 NA NA # 9: NA NA NA # 10: NA NA NA </code></pre> Some benchmarks reveal the base solution is the fastest by far: <pre class="prettyprint"><code>library("microbenchmark") david <- function() lapply(df, function(x) c(x[!is.na(x)], x[is.na(x)])) dt <- setDT(df) david.dt <- function() dt[, names(dt) := lapply(.SD, function(x) c(x[!is.na(x)], x[is.na(x)]))] microbenchmark(as.data.frame(lapply(df, beetroot)), david(), david.dt()) # Unit: microseconds # expr min lq median uq max neval # as.data.frame(lapply(df, beetroot)) 1145.224 1215.253 1274.417 1334.7870 4028.507 100 # david() 116.515 127.382 140.965 149.7185 308.493 100 # david.dt() 3087.335 3247.920 3330.627 3415.1460 6464.447 100 </code></pre>

After completely misunderstanding the question, here is my final answer: <pre class="prettyprint"><code># named after beetroot for being the first to ever need this functionality beetroot <- function(x) { # count NA num.na <- sum(is.na(x)) # remove NA x <- x[!is.na(x)] # glue the number of NAs at the end x <- c(x, rep(NA, num.na)) return(x) } # apply beetroot over each column in the dataframe as.data.frame(lapply(df, beetroot)) </code></pre> It will count the NAs, remove the NAs, and glue NAs at the bottom for each column in the data frame.

Move NAs to the end of each column in a data frame

Tags:

sorting

dataframe

r

na

I have such a data frame:

df <- structure(list(a = c(NA, NA, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L), b = c(NA, NA, NA, 1L, 2L, 3L, 4L, 5L, 6L, 7L), d = c(NA, NA, NA, NA, 1L, 2L, 3L, 4L, 5L, 6L)), .Names = c("a", "b", "d"), row.names = c(NA, -10L), class = "data.frame")

> df
    a  b  d
1  NA NA NA
2  NA NA NA
3   1 NA NA
4   2  1 NA
5   3  2  1
6   4  3  2
7   5  4  3
8   6  5  4
9   7  6  5
10  8  7  6

In each column, I'd like to move the non-NA values up to the start, and move the NAs to the end:

> df.out
    a  b  d
1   1  1  1
2   2  2  2
3   3  3  3
4   4  4  4
5   5  5  5
6   6  6  6
7   7  7 NA
8   8 NA NA
9  NA NA NA
10 NA NA NA

Update to make my questions clearer..

df <- structure(list(a = c(NA, NA, 1, 5, 34, 7, 3, 5, 8, 4), b = c(NA, 
NA, NA, 57, 2, 7, 9, 5, 12, 100), d = c(NA, NA, NA, NA, 5, 7, 
2, 8, 2, 5)), .Names = c("a", "b", "d"), row.names = c(NA, -10L
), class = "data.frame")

> df
    a   b  d
1  NA  NA NA
2  NA  NA NA
3   1  NA NA
4   5  57 NA
5  34   2  5
6   7   7  7
7   3   9  2
8   5   5  8
9   8  12  2
10  4 100  5

should result in:

    a   b  d
1   1  57  5
2   5   2  7
3  34   7  2
4   7   9  8
5   3   5  2
6   5  12  5
7   8 100 NA
8   4  NA NA
9  NA  NA NA
10 NA  NA NA

Seems like an easy task but I am stuck on where to start.. Can you help?

657

asked Sep 16 '14 12:09

erc

2 Answers

Another solution using lapply (without sorting/reordering the data- per your comments)

df[] <- lapply(df, function(x) c(x[!is.na(x)], x[is.na(x)]))
df
#     a   b  d
# 1   1  57  5
# 2   5   2  7
# 3  34   7  2
# 4   7   9  8
# 5   3   5  2
# 6   5  12  5
# 7   8 100 NA
# 8   4  NA NA
# 9  NA  NA NA
# 10 NA  NA NA

Or using data.table in order to update df by reference, rather than creating a copy of it (that solution won't sort your data neither)

library(data.table)
setDT(df)[, names(df) := lapply(.SD, function(x) c(x[!is.na(x)], x[is.na(x)]))]
df
#      a   b  d
#  1:  1  57  5
#  2:  5   2  7
#  3: 34   7  2
#  4:  7   9  8
#  5:  3   5  2
#  6:  5  12  5
#  7:  8 100 NA
#  8:  4  NA NA
#  9: NA  NA NA
# 10: NA  NA NA

Some benchmarks reveal the base solution is the fastest by far:

library("microbenchmark")
david <- function() lapply(df, function(x) c(x[!is.na(x)], x[is.na(x)]))
dt <- setDT(df)
david.dt <- function() dt[, names(dt) := lapply(.SD, function(x) c(x[!is.na(x)], x[is.na(x)]))]

microbenchmark(as.data.frame(lapply(df, beetroot)), david(), david.dt())
# Unit: microseconds
#                                 expr      min       lq   median        uq      max neval
#  as.data.frame(lapply(df, beetroot)) 1145.224 1215.253 1274.417 1334.7870 4028.507   100
#                              david()  116.515  127.382  140.965  149.7185  308.493   100
#                           david.dt() 3087.335 3247.920 3330.627 3415.1460 6464.447   100

163

answered Oct 11 '22 20:10

David Arenburg

After completely misunderstanding the question, here is my final answer:

# named after beetroot for being the first to ever need this functionality
beetroot <- function(x) {
    # count NA
    num.na <- sum(is.na(x))
    # remove NA
    x <- x[!is.na(x)]
    # glue the number of NAs at the end
    x <- c(x, rep(NA, num.na))
    return(x)
}

# apply beetroot over each column in the dataframe
as.data.frame(lapply(df, beetroot))

It will count the NAs, remove the NAs, and glue NAs at the bottom for each column in the data frame.

answered Oct 11 '22 19:10

PascalVKooten

Related questions
                            
                                How to center boxes on top of lines in the legend of a plot?
                            
                                R: Create duplicate rows based on a variable (dplyr preferred) [duplicate]
                            
                                Check if column value is in between (range) of two other column values
                            
                                How to summarize a list of combination
                            
                                Paste together two data frames element by element in R
                            
                                Saving a list of plots by their names()
                            
                                R: Find missing columns, add to data frame if missing
                            
                                How to add data by columns in csv file using R?
                            
                                plot line behind barplot
                            
                                check if a program is installed
                            
                                How to solve the 'ymax not defined'?
                            
                                Producing a new dataframe from an old dataframe?
                            
                                How to extract a number into digits using R?
                            
                                How to get the "code for creating a variable" from a data.frame
                            
                                R, merge multiple rows of text data frame into one cell
                            
                                ctree() - How to get the list of splitting conditions for each terminal node?
                            
                                What is the equivalent of the SumIf function in R
                            
                                Remove all unique rows
                            
                                replace NA value with the group value
                            
                                How to get current working directory in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With