Override column types when importing data using readr::read_csv() when there are many columns

Tags:

I am trying to read a csv file using readr::read_csv in R. The csv file that I am importing has about 150 columns, I am just including the first few columns for the example. I am looking to override the second column from the default type (which is date when I do read_csv) to character, or other date format.

Click to copy

GIS Join Match Code Data File Year  State Name  State Code  County Name County   Code   Area Name   Persons: Total G0100010    2008-2012   Alabama 1   Autauga County  1   Autauga County, Alabama 54590  df <- data.frame("GIS Join Match Code"="G0100010", "Data File" = "2008-2012", "State" = "Alabama", "County" = "Autauga County", "Population" = 54590)

The issue is that when I use readr::read_csv, it seems I may have to use all variables while overriding in the col_types (see error below). That is need to specify overriding all the 150 columns individually(?).. The question is that : Is there a way to specify overriding the col_type of just specific columns, or a named list of objects? In my case, it would be just overriding the column "Data File Year".

I understand that any omitted columns will be automatically parsed, which is fine for my analysis. I think it gets further complex as the column names have a space in them in the file I downloaded (For e.g., "Data File Year", "State Code") etc.

Click to copy

tempdata <- read_csv(df, col_types = "cc") Error: You have 135 column names, but 2 columns

The Other option I guess, if possible, is to just skip reading the second column all together?

430

asked Jul 22 '15 16:07

rajvijay

2 Answers

Here follows a more generic answer to this question if someone happens to stumble upon this in the future. It is less advisable to use "skip" to jump columns as this will fail to work if the imported data source structure is changed.

It could be easier in your example to simply set a default column type, and then define any columns that differ from the default.

E.g., if all columns typically are "d", but the date column should be "D", load the data as follows:

Click to copy

  read_csv(df, col_types = cols(.default = "d", date = "D"))

or if, e.g., column date should be "D" and column "xxx" be "i", do so as follows:

Click to copy

  read_csv(df, col_types = cols(.default = "d", date = "D", xxx = "i"))

The use of "default" above is powerful if you have multiple columns and only specific exceptions (such as "date" and "xxx").

123

answered Sep 21 '22 23:09

Nick

Yes. For example to force numeric data to be treated as characters:

Click to copy

examplecsv = "a,b,c\n1,2,a\n3,4,d" read_csv(examplecsv) # A tibble: 2 x 3 #      a     b     c #  <int> <int> <chr> #1     1     2     a #2     3     4     d read_csv(examplecsv, col_types = cols(b = col_character())) # A tibble: 2 x 3 #      a     b     c #  <int> <chr> <chr> #1     1     2     a #2     3     4     d

Choices are:

Click to copy

col_character()  col_date() col_time()  col_datetime()  col_double()  col_factor() # to enforce, will never be guessed col_integer()  col_logical()  col_number()  col_skip() # to force skip column

More: http://readr.tidyverse.org/articles/readr.html

answered Sep 18 '22 23:09

Lukasz

Related questions
                            
                                R: In RStudio how do I make knitr output to a different folder to avoid cluttering up my drive?
                            
                                Erratic seed behavior with rbinom(prob=0.5) in R
                            
                                R - Error : .onLoad failed in loadNamespace() for 'rJava'
                            
                                Special variables in ggplot (..count.., ..density.., etc.)
                            
                                What does the error "arguments imply differing number of rows: x, y" mean?
                            
                                Is cut() style binning available in dplyr?
                            
                                Running Entire R code [closed]
                            
                                Stop lapply from printing to console
                            
                                Simple approach to assigning clusters for new data after k-means clustering
                            
                                Creating a local R package repository
                            
                                Geographic / geospatial distance between 2 lists of lat/lon points (coordinates)
                            
                                How to generate distributions given, mean, SD, skew and kurtosis in R?
                            
                                Letter "y" comes after "i" when sorting alphabetically
                            
                                How to reset row names?
                            
                                Combine Points with lines with ggplot2
                            
                                How to save Leaflet in R map as png or jpg file?
                            
                                Why are there two assignment operators, `<-` and `->` in R?
                            
                                Extract pvalue from glm
                            
                                Saving a JSON object to file.JSON
                            
                                R color scatter plot points based on values

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Override column types when importing data using readr::read_csv() when there are many columns

Tags:

file-io

dataframe

r

csv

dplyr

rajvijay

People also ask

2 Answers

Nick

Lukasz

Recent Activity

Donate For Us