Removing Whitespace From a Whole Data Frame in R

Tags:

I've been trying to remove the white space that I have in a data frame (using R). The data frame is large (>1gb) and has multiple columns that contains white space in every data entry.

Is there a quick way to remove the white space from the whole data frame? I've been trying to do this on a subset of the first 10 rows of data using:

Click to copy

gsub( " ", "", mydata)

This didn't seem to work, although R returned an output which I have been unable to interpret.

Click to copy

str_replace( " ", "", mydata)

R returned 47 warnings and did not remove the white space.

Click to copy

erase_all(mydata, " ")

R returned an error saying 'Error: could not find function "erase_all"'

I would really appreciate some help with this as I've spent the last 24hrs trying to tackle this problem.

Thanks!

697

asked Dec 24 '13 12:12

3 Answers

A lot of the answers are older, so here in 2019 is a simple dplyr solution that will operate only on the character columns to remove trailing and leading whitespace.

Click to copy

library(dplyr)
library(stringr)

data %>%
  mutate_if(is.character, str_trim)

## ===== 2020 edit for dplyr (>= 1.0.0) =====
df %>% 
  mutate(across(where(is.character), str_trim))

You can switch out the str_trim() function for other ones if you want a different flavor of whitespace removal.

Click to copy

# for example, remove all spaces
df %>% 
  mutate(across(where(is.character), str_remove_all, pattern = fixed(" ")))

190

answered Oct 20 '22 00:10

Adam

Picking up on Fremzy and the comment from Stamper, this is now my handy routine for cleaning up whitespace in data:

Click to copy

df <- data.frame(lapply(df, trimws), stringsAsFactors = FALSE)

As others have noted this changes all types to character. In my work, I first determine the types available in the original and conversions required. After trimming, I re-apply the types needed.

If your original types are OK, apply the solution from MarkusN below https://stackoverflow.com/a/37815274/2200542

Those working with Excel files may wish to explore the readxl package which defaults to trim_ws = TRUE when reading.

answered Oct 20 '22 00:10

If i understood you correctly then you want to remove all the white spaces from entire data frame, i guess the code which you are using is good for removing spaces in the column names.I think you should try this:

Click to copy

 apply(myData,2,function(x)gsub('\\s+', '',x))

Hope this works.

This will return a matrix however, if you want to change it to data frame then do:

Click to copy

as.data.frame(apply(myData,2,function(x)gsub('\\s+', '',x)))

EDIT In 2020:

Using lapply and trimws function with both=TRUE can remove leading and trailing spaces but not inside it.Since there was no input data provided by OP, I am adding a dummy example to produce the results.

DATA:

Click to copy

df <- data.frame(val = c(" abc"," kl m","dfsd "),val1 = c("klm ","gdfs","123"),num=1:3,num1=2:4,stringsAsFactors = FALSE)

#situation: 1 (Using Base R), when we want to remove spaces only at the leading and trailing ends NOT inside the string values, we can use trimws

Click to copy

cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], trimws)

# situation: 2 (Using Base R) , when we want to remove spaces at every place in the dataframe in character columns (inside of a string as well as at the leading and trailing ends).

(This was the initial solution proposed using apply, please note a solution using apply seems to work but would be very slow, also the with the question its apparently not very clear if OP really wanted to remove leading/trailing blank or every blank in the data)

Click to copy

cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,cols_to_be_rectified] <- lapply(df[,cols_to_be_rectified], function(x)gsub('\\s+','',x))

## situation: 1 (Using data.table, removing only leading and trailing blanks)

Click to copy

library(data.table)
setDT(df)
cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,c(cols_to_be_rectified) := lapply(.SD, trimws), .SDcols = cols_to_be_rectified]

Output from situation1:

Click to copy

    val val1 num num1
1:  abc  klm   1    2
2: kl m gdfs   2    3
3: dfsd  123   3    4

## situation: 2 (Using data.table, removing every blank inside as well as leading/trailing blanks)

Click to copy

cols_to_be_rectified <- names(df)[vapply(df, is.character, logical(1))]
df[,c(cols_to_be_rectified) := lapply(.SD, function(x)gsub('\\s+', '', x)), .SDcols = cols_to_be_rectified]

Output from situation2:

Click to copy

    val val1 num num1
1:  abc  klm   1    2
2:  klm gdfs   2    3
3: dfsd  123   3    4

Note the difference between the outputs of both situation, In row number 2: you can see that, with trimws we can remove leading and trailing blanks, but with regex solution we are able to remove every blank(s).

I hope this helps , Thanks

answered Oct 19 '22 22:10

PKumar

Related questions
                            
                                Suppressing "null device" output with R in batch mode
                            
                                Add a "rank" column to a data frame
                            
                                Format numbers to significant figures nicely in R
                            
                                Efficiently sum across multiple columns in R
                            
                                Concatenate row-wise across specific columns of dataframe
                            
                                Transposing a dataframe maintaining the first column as heading
                            
                                What's the difference between substitute and quote in R
                            
                                How do I install an R package from the source tarball on windows?
                            
                                subtract a constant vector from each row in a matrix in r
                            
                                Remove duplicates keeping entry with largest absolute value
                            
                                Using different font styles in annotate (ggplot2)
                            
                                How can I change XTS to data.frame and keep Index?
                            
                                How do you read multiple .txt files into R? [duplicate]
                            
                                How to split a string into substrings of a given length? [duplicate]
                            
                                Remove square brackets from a string vector
                            
                                Changing date format in R
                            
                                What does the diff() function in R do? [closed]
                            
                                Create categories by comparing a numeric column with a fixed value
                            
                                Dealing with TRUE, FALSE, NA and NaN
                            
                                R how can I calculate difference between rows in a data frame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Removing Whitespace From a Whole Data Frame in R

Tags:

replace

r

whitespace

gsub

Thirst for Knowledge

People also ask

3 Answers

Adam

Anthony Simon Mielniczuk

PKumar

Recent Activity

Donate For Us