I am new at R programming and I want to read a text file in R.
One of the columns, lets say column 7 is numeric and each number represent an ID I want R to read the numbers as if they were strings. And count the number of times each ID appear in the file (such that later I can assign the frequency of each ID to the given ID for latter use) I have tried
mydata<-(read.table(filename.txt))
ID=mydata[7]
freq=table(ID)
This works but it takes the IDs as numbers. Now I have tried
freq=table(as.character(ID))
But then it takes the whole column ID as only one string and from
summary(freq)
I get
Number of cases in table: 1
Number of factors: 1
This problem can be solved by using split function to convert string to list and then the list comprehension which can help us iterating through the list and isdigit function helps to get the digit out of a string.
A string consists of one or more characters, which can include letters, numbers, and other types of characters. You can think of a string as plain text. A string represents alphanumeric data.
As the name suggests, numeric string is the string of numbers however not limited to string of 0-9. Numeric strings consist of optional sign, any number of digits, optional decimal part and optional exponential part. Thus "+0123.45e6" is a valid numeric string value.
At the time of reading the data into your data frame from the text file you can specify the type of each column using the colClasses
argument. See below a file have in my computer:
> head(read.csv("R/Data/ZipcodeCount.csv"))
X zipcode stateabb countyno countyname
1 1 401 NY 119 WESTCHESTER
2 391 501 NY 103 SUFFOLK
3 392 544 NY 103 SUFFOLK
4 393 601 PR 1 ADJUNTAS
5 630 602 PR 3 AGUADA
6 957 603 PR 5 AGUADILLA
> head(read.csv("R/Data/ZipcodeCount.csv",colClasses=c(rep("factor",5))))
X zipcode stateabb countyno countyname
1 1 00401 NY 119 WESTCHESTER
2 391 00501 NY 103 SUFFOLK
3 392 00544 NY 103 SUFFOLK
4 393 00601 PR 001 ADJUNTAS
5 630 00602 PR 003 AGUADA
6 957 00603 PR 005 AGUADILLA
> zip<-read.csv("R/Data/ZipcodeCount.csv",colClasses=c(rep("factor",5)))
> str(zip)
'data.frame': 53424 obs. of 5 variables:
$ X : Factor w/ 53424 levels "1","10000081",..: 1 36316 36333 36346 43638 52311 19581 23775 26481 26858 ...
$ zipcode : Factor w/ 41174 levels "00401","00501",..: 1 2 3 4 5 6 6 7 8 9 ...
$ stateabb : Factor w/ 60 levels ""," ","AK","AL",..: 41 41 41 46 46 46 46 46 46 46 ...
$ countyno : Factor w/ 380 levels "","000","001",..: 106 95 95 3 5 7 5 7 7 9 ...
$ countyname: Factor w/ 1925 levels "","ABBEVILLE",..: 1844 1662 1662 9 10 11 10 11 11 12 ...
> head(table(zip[,"zipcode"]))
00401 00501 00544 00601 00602 00603
1 1 1 1 1 2
as you can see R is no longer treating zipcodes as numbers but as factors. In your case you need to specify the class of the first 6 columns and then choose factor
as your seventh. So if the first 6 columns are numeric it should be something like this colClasses = c(rep("numeric",6),"factor")
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With