Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R remove numbers in data frame entries containing only numbers

I am reading in a data frame from an online csv file, but the person who create the file has accidentally entered some numbers into column which should just be city names. Sample for cities.data table.

City        Population   Foo   Bar
Seattle     10           foo1  bar1
98125       20           foo2  bar2
Kent 98042  30           foo3  bar3
98042 Kent  30           foo4  bar4

Desired output after removing rows with only numbers in the city column:

City        Population   Foo   Bar
Seattle     10           foo1  bar1
Kent 98042  30           foo3  bar2
98042 Kent  30           foo4  bar4

I want to remove the rows with ONLY numbers in the city column. Kent 98042 and 98042 Kent are both okay since it contains the city name, but since 98125 is not a city I remove that row.

I can't use is.numeric because the number is being read as a string in the csv file. I tried using regex,

cities.data <- cities.data[which(grepl("[0-9]+", cities.data) == FALSE)]

But this deletes rows with any numbers rather than just the one containing only numbers, e.g.

City        Population   Foo   Bar
Seattle     10           foo1  bar1

"Kent 98042" was deleted even though I wanted to keep that row. Suggestions? Please and thanks!

like image 362
siushi Avatar asked Dec 01 '17 22:12

siushi


People also ask

How do I remove numeric characters from R?

To remove dot and number at the end of the string, we can use gsub function. It will search for the pattern of dot and number at the end of the string in the vector then removal of the pattern can be done by using double quotes without space. After that the vector will be passed as shown in the below examples.

How do I count the number of items in a data frame in R?

The ncol() function in R programming That is, ncol() function returns the total number of columns present in the object.

How do I remove a column of numbers in R?

The most easiest way to drop columns is by using subset() function. In the code below, we are telling R to drop variables x and z. The '-' sign indicates dropping variables. Make sure the variable names would NOT be specified in quotes when using subset() function.

How do I remove a value from a data set in R?

To remove a character in an R data frame column, we can use gsub function which will replace the character with blank. For example, if we have a data frame called df that contains a character column say x which has a character ID in each value then it can be removed by using the command gsub("ID","",as.


1 Answers

df = read.table(text = "
City        Population   Foo   Bar
Seattle     10           foo1  bar1
98125       20           foo2  bar2
Kent98042  30           foo3  bar2
", header=T, stringsAsFactors=F)

library(dplyr)

df %>% filter(is.na(as.numeric(City)))

#        City Population  Foo  Bar
# 1   Seattle         10 foo1 bar1
# 2 Kent98042         30 foo3 bar2

The idea is that when we apply as.numeric to a character variable it will not return a NA value only if it is a number.

If you want to use base R you can use this: df[is.na(as.numeric(df$City)),]

like image 113
AntoniosK Avatar answered Sep 22 '22 16:09

AntoniosK