How can you read a CSV file in R with different number of columns

Tags:

I have a sparse data set, one whose number of columns vary in length, in a csv format. Here is a sample of the file text.

12223, University 12227, bridge, Sky 12828, Sunset 13801, Ground 14853, Tranceamerica 14854, San Francisco 15595, shibuya, Shrine 16126, fog, San Francisco 16520, California, ocean, summer, golden gate, beach, San Francisco

When I use

read.csv("data.txt", header = F)

R will interpret the data set as having 3 columns because the size is determined from the first 5 rows. Is there anyway to force r to put the data in more columns?

849

asked Sep 20 '13 17:09

CompChemist

2 Answers

Deep in the ?read.table documentation there is the following:

The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of col.names if it is specified and is longer. This could conceivably be wrong if fill or blank.lines.skip are true, so specify col.names if necessary (as in the ‘Examples’).

Therefore, let's define col.names to be length X (where X is the max number of fields in your dataset), and set fill = TRUE:

dat <- textConnection("12223, University 12227, bridge, Sky 12828, Sunset 13801, Ground 14853, Tranceamerica 14854, San Francisco 15595, shibuya, Shrine 16126, fog, San Francisco 16520, California, ocean, summer, golden gate, beach, San Francisco")  read.table(dat, header = FALSE, sep = ",",    col.names = paste0("V",seq_len(7)), fill = TRUE)       V1             V2             V3      V4           V5     V6             V7 1 12223     University                                                           2 12227         bridge            Sky                                            3 12828         Sunset                                                           4 13801         Ground                                                           5 14853  Tranceamerica                                                           6 14854  San Francisco                                                           7 15595        shibuya         Shrine                                            8 16126            fog  San Francisco                                            9 16520     California          ocean  summer  golden gate  beach  San Francisco

If the maximum number of fields is unknown, you can use the nifty utility function count.fields (which I found in the read.table example code):

count.fields(dat, sep = ',') # [1] 2 3 2 2 2 2 3 3 7 max(count.fields(dat, sep = ',')) # [1] 7

Possibly helpful related reading: Only read limited number of columns in R

157

answered Oct 12 '22 01:10

Blue Magister

You could read the data like this:

dat <- textConnection("12223, University 12227, bridge, Sky 12828, Sunset 13801, Ground 14853, Tranceamerica 14854, San Francisco 15595, shibuya, Shrine 16126, fog, San Francisco 16520, California, ocean, summer, golden gate, beach, San Francisco")  dat <- readLines(dat) dat <- strsplit(dat, ",")

This results in a list.

answered Oct 12 '22 01:10

Roland

Related questions
                            
                                Extract R-square value with R in linear models [duplicate]
                            
                                Practical limits of R data frame
                            
                                remove all line breaks (enter symbols) from the string using R
                            
                                Finding percentage in a sub-group using group_by and summarise
                            
                                How to order a data frame by one descending and one ascending column?
                            
                                Why do I get "warning longer object length is not a multiple of shorter object length"?
                            
                                How to Reverse a string in R
                            
                                How to control ordering of stacked bar chart using identity on ggplot2
                            
                                Calculate AUC in R?
                            
                                How to do a data.table merge operation
                            
                                Specify widths and heights of plots with grid.arrange
                            
                                SparkR vs sparklyr [closed]
                            
                                What is the difference between <NA> and NA?
                            
                                Transform only one axis to log10 scale with ggplot2
                            
                                How to connect R with Access database in 64-bit Window?
                            
                                Dealing with timestamps in R
                            
                                Is there a better alternative than string manipulation to programmatically build formulas?
                            
                                How can I use Emacs ESS mode with R markdown?
                            
                                Avoiding type conflicts with dplyr::case_when
                            
                                ggmap Error: GeomRasterAnn was built with an incompatible version of ggproto

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can you read a CSV file in R with different number of columns

Tags:

import

r

csv

read.table

sparse-columns

CompChemist

People also ask

2 Answers

Blue Magister

Roland

Recent Activity

Donate For Us