What's the best way to replace missing values with NA when reading in a .csv?

Tags:

I have a .csv dataset with many missing values, and I'd like R to recognize them all the same way (the "correct" way) when I read the table in. I've been using:

import = read.csv("/Users/dataset.csv", 
                  header =T, na.strings=c(""))

This script fills all the empty cells with something, but it's not consistant. When I look at the data with head(import), some missing cells are filled with <NA> and some missing cells are filled with NA. I fear that R treats these two ways of identifying missing values differently when start analyzing the dataset, so I'd like to have the import uniformly read in those missing values.

Finally, some of the missing values in my csv file are represented with a period only. I would also like those periods to be represented by the correct missing value notation when I import to R.

610

asked Dec 11 '12 15:12

Luke

1 Answers

The <NA> vs NA just means that some of your columns are character and some are numeric, that's all. Absolutely nothing is wrong with that.

As Ben mentioned above, if some of your missing values in the csv are represented by a single period, ., then you can specify a vector of values that should be treated as NAs via:

na.strings=c("",".","NA")

as an argument to read.csv.

170

answered Oct 22 '22 10:10

joran

Related questions
                            
                                Graphing perpendicular offsets in a least squares regression plot in R
                            
                                shading area between two lines in r
                            
                                Change R Markdown plot width [duplicate]
                            
                                Using setDT inside a function
                            
                                Using R to connect to a sharepoint list
                            
                                How to shade part of a density curve in ggplot (with no y axis data)
                            
                                Using ipython magics in R jupyter notebook?
                            
                                set missing values for multiple labelled variables
                            
                                Programming with dplyr using string as input
                            
                                Warning: unable to access index for repository https://www.stats.ox.ac.uk/pub/RWin/src/contrib: [duplicate]
                            
                                How to use Cairo PNGs in R Markdown
                            
                                Discounted Cumulative Sum in R
                            
                                Reshape data frame to convert factors into columns in R
                            
                                trying to remove all margins so that plot region comprises the entire graphic
                            
                                Using Rcpp within parallel code via snow to make a cluster
                            
                                R Plot Specify number of time tickmarks - time/date equivalent to pretty
                            
                                Why do rapply and lapply handle NULL differently?
                            
                                Rscript vs. source: What are the key differences?
                            
                                Why apply() returns a transposed xts matrix?
                            
                                ggplot2 boxplot with labelled rug

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With