When I read a CSV file containing a trailing delimiter using <code>readr::read_csv()</code>, I get a warning that a new name for the last column was created. Here is the contents of a short example file to show what a mean: <pre class="prettyprint"><code>A,B,C, 2,1,1, 14,22,5, 9,-4,8, 17,9,-3, </code></pre> Note the trailing comma at the end of each row. Now if I load this data with <pre class="prettyprint"><code>readr::read_csv("A,B,C,\n2,1,1,\n14,22,5,\n9,-4,8,\n17,9,-3,") </code></pre> I get the following message: <pre class="prettyprint"><code>New names: • `` -> `...4` </code></pre> The resulting tibble has an extra fourth column names <code>...4</code> consisting of <code>NA</code> values in each row: <pre class="prettyprint"><code># A tibble: 4 × 4 A B C ...4 <dbl> <dbl> <dbl> <lgl> 1 2 1 1 NA 2 14 22 5 NA 3 9 -4 8 NA 4 17 9 -3 NA </code></pre> Even if I explicitly load only the first three columns with <pre class="prettyprint"><code>read_csv( "A,B,C,\n2,1,1,\n14,22,5,\n9,-4,8,\n17,9,-3,", col_types=cols_only( A=col_integer(), B=col_integer(), C=col_integer() ) ) </code></pre> I still get this message. Is this the expected behavior or is there some way to tell <code>readr::read_csv()</code> that it is supposed to ignore all columns except the ones I specify? Or is there another way to tidy up this (apparently malformed) CSV so that trailing delimiters are deleted/ignored?

I don't think you can. From what I can see in the documentation, <code>cols_only()</code> is for R objects that you have already loaded in. However, the <code>fread()</code> function from the <code>data.table</code> library allows you to select specific column names as a file is read in: <code>DT <- fread("filename.csv", select = c("colA","colB"))</code>

Here's another example with error message. <pre class="prettyprint"><code>> read_csv("1,2,3\n4,5,6", col_names = c("x", "y")) Warning: 2 parsing failures. row # A tibble: 2 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 1 <NA> 2 columns 3 columns literal data file 2 2 <NA> 2 columns 3 columns literal data # A tibble: 2 x 2 x y <int> <int> 1 1 2 2 4 5 </code></pre> Here is the fix/hack. Also see this SOF link. Suppress reader parse problems in r <pre class="prettyprint"><code>> suppressWarnings(read_csv("1,2,3\n4,5,6", col_names = c("x", "y"))) # A tibble: 2 x 2 x y <int> <int> 1 1 2 2 4 5 </code></pre>

Ignore trailing delimiters in readr::read_csv

Q: Is Readr in Tidyverse?

As well as readr, for reading flat files, the tidyverse package installs a number of other packages for reading data: DBI for relational databases.

Q: What R package is Read_csv in?

If you are working with larger files, you should use the read_csv() function readr package. readr is a third-party library hence, in order to use readr library, you need to first install it by using install. packages('readr') . Once installation completes, load the readr library in order to use this read_csv() method.

Q: What function would you use to read a file where fields were separated with?

11.2 Getting started What function would you use to read a file where fields were separated with “|”? Use the read_delim() function with the argument delim="|" .

Tags:

r

csv

tidyverse

readr

When I read a CSV file containing a trailing delimiter using readr::read_csv(), I get a warning that a new name for the last column was created. Here is the contents of a short example file to show what a mean:

A,B,C,
2,1,1,
14,22,5,
9,-4,8,
17,9,-3,

Note the trailing comma at the end of each row. Now if I load this data with

readr::read_csv("A,B,C,\n2,1,1,\n14,22,5,\n9,-4,8,\n17,9,-3,")

I get the following message:

New names:
• `` -> `...4`

The resulting tibble has an extra fourth column names ...4 consisting of NA values in each row:

# A tibble: 4 × 4
      A     B     C ...4 
  <dbl> <dbl> <dbl> <lgl>
1     2     1     1 NA   
2    14    22     5 NA   
3     9    -4     8 NA   
4    17     9    -3 NA

Even if I explicitly load only the first three columns with

read_csv(
    "A,B,C,\n2,1,1,\n14,22,5,\n9,-4,8,\n17,9,-3,",
    col_types=cols_only(
        A=col_integer(),
        B=col_integer(),
        C=col_integer()
    )
)

I still get this message.

Is this the expected behavior or is there some way to tell readr::read_csv() that it is supposed to ignore all columns except the ones I specify? Or is there another way to tidy up this (apparently malformed) CSV so that trailing delimiters are deleted/ignored?

978

asked Dec 22 '16 09:12

cbrnr

2 Answers

I don't think you can. From what I can see in the documentation, cols_only() is for R objects that you have already loaded in.

However, the fread() function from the data.table library allows you to select specific column names as a file is read in:

DT <- fread("filename.csv", select = c("colA","colB"))

answered Oct 12 '22 11:10

Oliver Frost

Here's another example with error message.

> read_csv("1,2,3\n4,5,6", col_names = c("x", "y"))
Warning: 2 parsing failures.
row # A tibble: 2 x 5 col     row   col  expected    actual         file expected   <int> <chr>     <chr>     <chr>        <chr> actual 1     1  <NA> 2 columns 3 columns literal data file 2     2  <NA> 2 columns 3 columns literal data

# A tibble: 2 x 2
      x     y
  <int> <int>
1     1     2
2     4     5

Here is the fix/hack. Also see this SOF link. Suppress reader parse problems in r

> suppressWarnings(read_csv("1,2,3\n4,5,6", col_names = c("x", "y")))
# A tibble: 2 x 2
      x     y
  <int> <int>
1     1     2
2     4     5

answered Oct 12 '22 11:10

AG1

Related questions
                            
                                find indices of values within tolerance range in R
                            
                                Non-equi join, then summarize by group
                            
                                Clear plotly click event
                            
                                plot polynomial regression line with ggplot stat_smooth
                            
                                Overlap ranges in single dataframe
                            
                                ggplot omits polygon holes
                            
                                R: Calculating row mean based on column name partial match
                            
                                Visualizing the difference between two points with ggplot2
                            
                                Failed to run a Shiny app on AWS Ubuntu instance. xdg-open: no method available for opening 'http://127.0.0.1:3572'
                            
                                Why does dplyr error in this nested if_else, when logical condition means output should not be evaluated?
                            
                                Widening a dataframe to get monthly sums of revenue for all unique values of catogorical columns in R
                            
                                Heatmap colors not working in plotly
                            
                                Using tidyr complete() with column names specified in variables
                            
                                Set R bookdown input directory
                            
                                What is '.R' folder and where to look for it?
                            
                                portion of a raster cell covered by one or more polygons: is there a faster way to do this (in R)?
                            
                                Dynamically create checkBoxGroup based on a file in shiny
                            
                                SE filter_ by function taking multiple columns
                            
                                How to remove duplicated rows by a column in an R matrix
                            
                                Output error/warning log (txt file) when running R script under command line

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With