R remove numbers in data frame entries containing only numbers

Tags:

I am reading in a data frame from an online csv file, but the person who create the file has accidentally entered some numbers into column which should just be city names. Sample for cities.data table.

City        Population   Foo   Bar
Seattle     10           foo1  bar1
98125       20           foo2  bar2
Kent 98042  30           foo3  bar3
98042 Kent  30           foo4  bar4

Desired output after removing rows with only numbers in the city column:

City        Population   Foo   Bar
Seattle     10           foo1  bar1
Kent 98042  30           foo3  bar2
98042 Kent  30           foo4  bar4

I want to remove the rows with ONLY numbers in the city column. Kent 98042 and 98042 Kent are both okay since it contains the city name, but since 98125 is not a city I remove that row.

I can't use is.numeric because the number is being read as a string in the csv file. I tried using regex,

cities.data <- cities.data[which(grepl("[0-9]+", cities.data) == FALSE)]

But this deletes rows with any numbers rather than just the one containing only numbers, e.g.

City        Population   Foo   Bar
Seattle     10           foo1  bar1

"Kent 98042" was deleted even though I wanted to keep that row. Suggestions? Please and thanks!

362

asked Dec 01 '17 22:12

siushi

1 Answers

df = read.table(text = "
City        Population   Foo   Bar
Seattle     10           foo1  bar1
98125       20           foo2  bar2
Kent98042  30           foo3  bar2
", header=T, stringsAsFactors=F)

library(dplyr)

df %>% filter(is.na(as.numeric(City)))

#        City Population  Foo  Bar
# 1   Seattle         10 foo1 bar1
# 2 Kent98042         30 foo3 bar2

The idea is that when we apply as.numeric to a character variable it will not return a NA value only if it is a number.

If you want to use base R you can use this: df[is.na(as.numeric(df$City)),]

113

answered Sep 22 '22 16:09

AntoniosK

Related questions
                            
                                Access table in other than default scheme (database) from sparklyr
                            
                                How to summarise data based on calculations on dates
                            
                                How do we return ggplot graphs through plumber web api? [duplicate]
                            
                                More flexible citation formats
                            
                                Calculate average daily value from large data set with R standard format date/times?
                            
                                RStudio - no symbol named 'X' in scope
                            
                                Filtering columns in data table by vector of names
                            
                                How do I put Bookdown chapters in a subdirectory?
                            
                                How can I load a data frame saved in pandas as an HDF5 file in R?
                            
                                geom_area fill with different colors
                            
                                Getting user input from R console: Rcpp and std::cin
                            
                                How to subset a raster based on grid cell values
                            
                                Finding R help pages not named for specific commands
                            
                                Matching data frames based on shortest geographic distance
                            
                                How to display input text inside some other text in shiny?
                            
                                summarise returning -inf when using na.rm = TRUE
                            
                                R, Rstudio Console Encoding Windows
                            
                                Can dplyr::case_when return mix of NAs and non-NAs? [duplicate]
                            
                                How to map all the states of US using R with the number of crimes occurred in each state?
                            
                                cbind on ggplotGrob objects fails with "Error in mmm < each : comparison of these types is not implemented"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

R remove numbers in data frame entries containing only numbers

Tags:

regex

dataframe

r

filter

dplyr

siushi

People also ask

1 Answers

AntoniosK

Recent Activity

Donate For Us