Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading numbers as strings

I am new at R programming and I want to read a text file in R.

One of the columns, lets say column 7 is numeric and each number represent an ID I want R to read the numbers as if they were strings. And count the number of times each ID appear in the file (such that later I can assign the frequency of each ID to the given ID for latter use) I have tried

mydata<-(read.table(filename.txt))
ID=mydata[7]
freq=table(ID)

This works but it takes the IDs as numbers. Now I have tried

freq=table(as.character(ID))

But then it takes the whole column ID as only one string and from

summary(freq)

I get

Number of cases in table: 1 
Number of factors: 1 
like image 315
user2115322 Avatar asked Feb 27 '13 12:02

user2115322


People also ask

How do you read a number in a string?

This problem can be solved by using split function to convert string to list and then the list comprehension which can help us iterating through the list and isdigit function helps to get the digit out of a string.

Can numbers be a string?

A string consists of one or more characters, which can include letters, numbers, and other types of characters. You can think of a string as plain text. A string represents alphanumeric data.

What is a numerical string?

As the name suggests, numeric string is the string of numbers however not limited to string of 0-9. Numeric strings consist of optional sign, any number of digits, optional decimal part and optional exponential part. Thus "+0123.45e6" is a valid numeric string value.


1 Answers

At the time of reading the data into your data frame from the text file you can specify the type of each column using the colClasses argument. See below a file have in my computer:

> head(read.csv("R/Data/ZipcodeCount.csv"))
    X zipcode stateabb countyno  countyname
1   1     401       NY      119 WESTCHESTER
2 391     501       NY      103     SUFFOLK
3 392     544       NY      103     SUFFOLK
4 393     601       PR        1    ADJUNTAS
5 630     602       PR        3      AGUADA
6 957     603       PR        5   AGUADILLA
> head(read.csv("R/Data/ZipcodeCount.csv",colClasses=c(rep("factor",5))))
    X zipcode stateabb countyno  countyname
1   1   00401       NY      119 WESTCHESTER
2 391   00501       NY      103     SUFFOLK
3 392   00544       NY      103     SUFFOLK
4 393   00601       PR      001    ADJUNTAS
5 630   00602       PR      003      AGUADA
6 957   00603       PR      005   AGUADILLA

> zip<-read.csv("R/Data/ZipcodeCount.csv",colClasses=c(rep("factor",5)))
> str(zip)
'data.frame':   53424 obs. of  5 variables:
 $ X         : Factor w/ 53424 levels "1","10000081",..: 1 36316 36333 36346 43638 52311 19581 23775 26481 26858 ...
 $ zipcode   : Factor w/ 41174 levels "00401","00501",..: 1 2 3 4 5 6 6 7 8 9 ...
 $ stateabb  : Factor w/ 60 levels "","  ","AK","AL",..: 41 41 41 46 46 46 46 46 46 46 ...
 $ countyno  : Factor w/ 380 levels "","000","001",..: 106 95 95 3 5 7 5 7 7 9 ...
 $ countyname: Factor w/ 1925 levels "","ABBEVILLE",..: 1844 1662 1662 9 10 11 10 11 11 12 ...
> head(table(zip[,"zipcode"]))

00401 00501 00544 00601 00602 00603 
    1     1     1     1     1     2 

as you can see R is no longer treating zipcodes as numbers but as factors. In your case you need to specify the class of the first 6 columns and then choose factor as your seventh. So if the first 6 columns are numeric it should be something like this colClasses = c(rep("numeric",6),"factor").

like image 118
tepedizzle Avatar answered Sep 16 '22 21:09

tepedizzle