I have read in some lengthy data with read.csv()
, and to my surprise the data is coming out as factors rather than numbers, so I'm guessing there must be at least one non-numeric item in the data. How can I find where these items are?
For example, if I have the following data frame:
df <- data.frame(c(1,2,3,4,"five",6,7,8,"nine",10))
I would like to know that rows 5 and 9 have non-numeric data. How would I do that?
Non-numeric data want a bar graph or pie chart; numeric data want a histogram or stemplot. Histograms and bar graphs can show frequency or relative frequency.
Non – numeric data is any form of data that is measured in non-number (or word) form. It makes use of symbols and letters. Such data can only be identified in a word format. For example, employee address, date of birth, name, etc.
Section 1.2 • Qualitative data consist of attributes, labels, and other non-numerical entries. Quantitative data consist of numerical measurements or counts.
df <- data.frame(x = c(1,2,3,4,"five",6,7,8,"nine",10))
The trick is knowing that converting to numeric via as.numeric(as.character(.))
will convert non-numbers to NA
.
which(is.na(as.numeric(as.character(df[[1]])))) ## 5 9
(just using as.numeric(df[[1]])
doesn't work - it just drops the levels leaving the numeric codes).
You might choose to suppress the warnings:
which.nonnum <- function(x) { which(is.na(suppressWarnings(as.numeric(as.character(x))))) } which.nonnum(df[[1]])
To be more careful, you should also check that the values weren't NA before conversion:
which.nonnum <- function(x) { badNum <- is.na(suppressWarnings(as.numeric(as.character(x)))) which(badNum & !is.na(x)) }
lapply(df, which.nonnum)
will report 'bad' values for all columns of the data frame.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With