Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort data by number of NA's in each line

I want to sort a data frame that has some missing values.

name    dist1   dist2   dist3   prop1   prop2   prop3   month2  month5  month10 month25 month50 issue
1   A1  232.0   1462.91 232.0000    728.00  0.370   0.05633453  1188.1  1188.1  1188.1  1188.1  1188.1  Yes
2   A2  142.0   58.26   2847.7690   17.10   0.080   0.07667063  14581.6 15382.0 19510.9 25504.0 NA  Yes
3   A3  102.0   1160.94 102.0000    53.40   0.090   0.07667063  144.8   144.8   144.8   291.8   761.4   Yes
4   A4  126.0   1377.23 126.0000    64.30   2.120   0.11040091  366.5   496.8   665.3   NA  NA  Yes
5   A5  118.0   654.94  118.0000    16.50   0.030   0.05841914  0.0 10.2    198.4   733.7   1717.0  Yes
6   A6  110.0   1084.63 110.0000    340.00  0.390   0.07405169  4635.0  4863.0  7725.0  8028.0  NA  Yes
7   A7  123.0   0.00    1801.1811   83.40   0.030   0.06420000  4686.9  4803.6  5052.0  5418.5  7237.5  Yes
8   A8  125.0   0.00    5557.7428   1.14    0.050   0.06604286  4932.0  8607.0  10827.0 13679.0 NA  Yes
9   A9  108.0   0.00    6207.3491   92.30   0.070   0.08710000  3360.0  7440.0  10508.0 12571.0 16925.0 Yes
10  A10 60.0    0.00    2500.0000   0.73    0.020   0.06819053  15.1    19.9    19.9    19.9    19.9    Yes
11  A11 210.0   700.78  210.0000    7.78    0.290   0.07866589  182.4   182.4   182.4   298.0   1864.1  No
12  A12 155.0   530.48  155.0000    1.33    0.170   0.07578345  1.0 2.0 3.0 4.0 5.0 No
13  A13 21.0    840.00  21.0000 308.00  0.030   0.05508490  1008.7  1450.8  2439.8  4947.2  6818.9  No
14  A14 114.0   1083.24 114.0000    171.00  0.040   0.04670335  564.7   722.8   760.6   879.8   944.4   No
15  A15 109.0   1051.03 109.0000    20.30   0.070   0.05274389  5503.1  9127.9  11167.4 18226.1 20243.4 No
16  A16 107.0   922.80  107.0000    0.03    0.020   0.04403927  232.6   1016.5  2203.8  3844.9  4000.6  No
17  A17 100.0   278.10  100.0000    0.82    0.100   0.07270705  2754.0  4701.7  5311.9  9579.3  14651.3 No
18  A18 138.0   798.42  138.0000    1.04    0.100   0.07148773  3657.2  4014.0  4525.9  4674.7  4838.5  No
19  A19 105.0   695.02  105.0000    1.41    0.120   0.06716963  3530.2  4076.1  11517.0 18899.5 21073.0 No
20  A20 81.0    12.00   879.2651    16.70   0.120   0.08087098  6477.1  6788.8  7320.0  7947.7  8726.6  No
21  A21 102.0   1052.96 102.0000    66.40   0.010   0.02926897  181.7   294.0   355.5   1431.6  NA  No

only month2 month5 month10 month25 month50 contain NAs, and if one if the earlier one is NA, then all the rests are also NAs.

ie.e if month2 is NA, then month5 month10 month25 month50 are all NA's.

I want to sort the data based on the number of missing values in each line.

The sorted data frame should have all complete data first, followed by lines with 1 missing value, then with 2, and so on.

Can anyone help me?

like image 429
TYZ Avatar asked Feb 12 '23 19:02

TYZ


2 Answers

You can use

dat[order(rowSums(is.na(dat))), ]

where dat is the name of your data frame.

like image 114
Sven Hohenstein Avatar answered Feb 15 '23 07:02

Sven Hohenstein


Is this what you want? Assume dat is your given sample data.

> s <- sort(apply(is.na(dat), 1, sum))
> dat[names(s), ]
like image 31
Rich Scriven Avatar answered Feb 15 '23 07:02

Rich Scriven