Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding non-numeric data in a data frame or vector

Tags:

I have read in some lengthy data with read.csv(), and to my surprise the data is coming out as factors rather than numbers, so I'm guessing there must be at least one non-numeric item in the data. How can I find where these items are?

For example, if I have the following data frame:

df <- data.frame(c(1,2,3,4,"five",6,7,8,"nine",10)) 

I would like to know that rows 5 and 9 have non-numeric data. How would I do that?

like image 366
stackoverflowuser2010 Avatar asked Jan 17 '14 21:01

stackoverflowuser2010


People also ask

How do you display non-numerical data?

Non-numeric data want a bar graph or pie chart; numeric data want a histogram or stemplot. Histograms and bar graphs can show frequency or relative frequency.

Which data type is a non-numeric data type?

Non – numeric data is any form of data that is measured in non-number (or word) form. It makes use of symbols and letters. Such data can only be identified in a word format. For example, employee address, date of birth, name, etc.

What type of data consist of non-numerical characteristics?

Section 1.2 • Qualitative data consist of attributes, labels, and other non-numerical entries. Quantitative data consist of numerical measurements or counts.


1 Answers

df <- data.frame(x = c(1,2,3,4,"five",6,7,8,"nine",10)) 

The trick is knowing that converting to numeric via as.numeric(as.character(.)) will convert non-numbers to NA.

which(is.na(as.numeric(as.character(df[[1]])))) ## 5 9 

(just using as.numeric(df[[1]]) doesn't work - it just drops the levels leaving the numeric codes).

You might choose to suppress the warnings:

which.nonnum <- function(x) {    which(is.na(suppressWarnings(as.numeric(as.character(x))))) } which.nonnum(df[[1]]) 

To be more careful, you should also check that the values weren't NA before conversion:

which.nonnum <- function(x) {    badNum <- is.na(suppressWarnings(as.numeric(as.character(x))))    which(badNum & !is.na(x)) } 

lapply(df, which.nonnum) will report 'bad' values for all columns of the data frame.

like image 52
Ben Bolker Avatar answered Sep 20 '22 03:09

Ben Bolker