I am trying to read a csv file with R. I can read the file but I have levels when I call a variable. What are these levels and how can I remove them? The file can be downloaded here file
> data=read.csv("Documents/bet/I1.csv",sep=",")
> data$HomeTeam
[1] Sampdoria Verona Cagliari Inter Lazio Livorno Napoli Parma
[9] Torino Fiorentina Chievo Juventus Atalanta Bologna Catania Genoa
[17] Milan Roma Sassuolo Udinese Inter Napoli Torino Fiorentina
[25] Lazio Livorno Sampdoria Udinese Verona Parma Cagliari Chievo
[33] Genoa Atalanta Bologna Catania Juventus Milan Roma Sassuolo
[41] Udinese Bologna Chievo Lazio Livorno Napoli Parma Sampdoria
[49] Torino Inter Genoa Milan Atalanta Cagliari Catania Roma
[57] Sassuolo Torino Verona Fiorentina Bologna Catania Napoli Parma
[65] Sampdoria Udinese Juventus Lazio Chievo Inter Roma Cagliari
[73] Milan Atalanta Fiorentina Genoa Livorno Sassuolo Verona Torino
[81] Inter Sampdoria Bologna Catania Chievo Juventus Lazio Napoli
[89] Parma Udinese Atalanta Cagliari Fiorentina Genoa Juventus Livorno
[97] Milan Sassuolo Verona Roma Milan Napoli Parma Lazio
[105] Livorno Sampdoria Torino Udinese Verona Bologna Catania Inter
[113] Atalanta Cagliari Chievo Genoa Parma Roma Fiorentina Juventus
[121] Milan Napoli Verona Bologna Livorno Sampdoria Sassuolo Torino
[129] Udinese Roma
20 Levels: Atalanta Bologna Cagliari Catania Chievo Fiorentina Genoa Inter Juventus ... Verona
When you use ?read.csv to read a file, the argument stringsAsFactors
is set by default to TRUE
, you just need to set it to false to not get this result. This should work:
data = read.csv("Documents/bet/I1.csv", sep=",", stringsAsFactors=FALSE)
Under the default, columns (variables) in the file that contain strings are assumed to be factors. A factor is a categorical variable that can take only one of a fixed, finite set of possibilities. Those possible categories are the levels. You can read about factors in the R Intro manual here, and this is another tutorial.
In addition, since you are using read.csv, adding the sep=","
is redundant. It doesn't harm anything, but the comma is taken as the separator by default.
The presence of levels for your variable HomeTeam
indicates that it is a factor (with 20 levels). You can specify StringAsFactors=FALSE
argument in the read.csv
function to remove it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With