I am new at R programming and I want to read a text file in R. One of the columns, lets say column 7 is numeric and each number represent an ID I want R to read the numbers as if they were strings. And count the number of times each ID appear in the file (such that later I can assign the frequency of each ID to the given ID for latter use) I have tried <pre class="prettyprint"><code>mydata<-(read.table(filename.txt)) ID=mydata[7] freq=table(ID) </code></pre> This works but it takes the IDs as numbers. Now I have tried <pre class="prettyprint"><code>freq=table(as.character(ID)) </code></pre> But then it takes the whole column ID as only one string and from <pre class="prettyprint"><code>summary(freq) </code></pre> I get <pre class="prettyprint"><code>Number of cases in table: 1 Number of factors: 1 </code></pre>

At the time of reading the data into your data frame from the text file you can specify the type of each column using the <code>colClasses</code> argument. See below a file have in my computer: <pre class="prettyprint"><code>> head(read.csv("R/Data/ZipcodeCount.csv")) X zipcode stateabb countyno countyname 1 1 401 NY 119 WESTCHESTER 2 391 501 NY 103 SUFFOLK 3 392 544 NY 103 SUFFOLK 4 393 601 PR 1 ADJUNTAS 5 630 602 PR 3 AGUADA 6 957 603 PR 5 AGUADILLA > head(read.csv("R/Data/ZipcodeCount.csv",colClasses=c(rep("factor",5)))) X zipcode stateabb countyno countyname 1 1 00401 NY 119 WESTCHESTER 2 391 00501 NY 103 SUFFOLK 3 392 00544 NY 103 SUFFOLK 4 393 00601 PR 001 ADJUNTAS 5 630 00602 PR 003 AGUADA 6 957 00603 PR 005 AGUADILLA > zip<-read.csv("R/Data/ZipcodeCount.csv",colClasses=c(rep("factor",5))) > str(zip) 'data.frame': 53424 obs. of 5 variables: $ X : Factor w/ 53424 levels "1","10000081",..: 1 36316 36333 36346 43638 52311 19581 23775 26481 26858 ... $ zipcode : Factor w/ 41174 levels "00401","00501",..: 1 2 3 4 5 6 6 7 8 9 ... $ stateabb : Factor w/ 60 levels ""," ","AK","AL",..: 41 41 41 46 46 46 46 46 46 46 ... $ countyno : Factor w/ 380 levels "","000","001",..: 106 95 95 3 5 7 5 7 7 9 ... $ countyname: Factor w/ 1925 levels "","ABBEVILLE",..: 1844 1662 1662 9 10 11 10 11 11 12 ... > head(table(zip[,"zipcode"])) 00401 00501 00544 00601 00602 00603 1 1 1 1 1 2 </code></pre> as you can see R is no longer treating zipcodes as numbers but as factors. In your case you need to specify the class of the first 6 columns and then choose <code>factor</code> as your seventh. So if the first 6 columns are numeric it should be something like this <code>colClasses = c(rep("numeric",6),"factor")</code>.

Reading numbers as strings

Tags:

file

r

file-read

formal-languages

I am new at R programming and I want to read a text file in R.

One of the columns, lets say column 7 is numeric and each number represent an ID I want R to read the numbers as if they were strings. And count the number of times each ID appear in the file (such that later I can assign the frequency of each ID to the given ID for latter use) I have tried

mydata<-(read.table(filename.txt))
ID=mydata[7]
freq=table(ID)

This works but it takes the IDs as numbers. Now I have tried

freq=table(as.character(ID))

But then it takes the whole column ID as only one string and from

summary(freq)

I get

Number of cases in table: 1 
Number of factors: 1

315

asked Feb 27 '13 12:02

user2115322

1 Answers

At the time of reading the data into your data frame from the text file you can specify the type of each column using the colClasses argument. See below a file have in my computer:

> head(read.csv("R/Data/ZipcodeCount.csv"))
    X zipcode stateabb countyno  countyname
1   1     401       NY      119 WESTCHESTER
2 391     501       NY      103     SUFFOLK
3 392     544       NY      103     SUFFOLK
4 393     601       PR        1    ADJUNTAS
5 630     602       PR        3      AGUADA
6 957     603       PR        5   AGUADILLA
> head(read.csv("R/Data/ZipcodeCount.csv",colClasses=c(rep("factor",5))))
    X zipcode stateabb countyno  countyname
1   1   00401       NY      119 WESTCHESTER
2 391   00501       NY      103     SUFFOLK
3 392   00544       NY      103     SUFFOLK
4 393   00601       PR      001    ADJUNTAS
5 630   00602       PR      003      AGUADA
6 957   00603       PR      005   AGUADILLA

> zip<-read.csv("R/Data/ZipcodeCount.csv",colClasses=c(rep("factor",5)))
> str(zip)
'data.frame':   53424 obs. of  5 variables:
 $ X         : Factor w/ 53424 levels "1","10000081",..: 1 36316 36333 36346 43638 52311 19581 23775 26481 26858 ...
 $ zipcode   : Factor w/ 41174 levels "00401","00501",..: 1 2 3 4 5 6 6 7 8 9 ...
 $ stateabb  : Factor w/ 60 levels "","  ","AK","AL",..: 41 41 41 46 46 46 46 46 46 46 ...
 $ countyno  : Factor w/ 380 levels "","000","001",..: 106 95 95 3 5 7 5 7 7 9 ...
 $ countyname: Factor w/ 1925 levels "","ABBEVILLE",..: 1844 1662 1662 9 10 11 10 11 11 12 ...
> head(table(zip[,"zipcode"]))

00401 00501 00544 00601 00602 00603 
    1     1     1     1     1     2

as you can see R is no longer treating zipcodes as numbers but as factors. In your case you need to specify the class of the first 6 columns and then choose factor as your seventh. So if the first 6 columns are numeric it should be something like this colClasses = c(rep("numeric",6),"factor").

118

answered Sep 16 '22 21:09

tepedizzle

Related questions
                            
                                lead or lag function to get several values, not just the nth
                            
                                Finding a curve to match data
                            
                                How do I load example datasets in R?
                            
                                StatET debugging tool
                            
                                ggplot and R: Two variables over time
                            
                                Change colour scheme for ggplot geom_polygon in R
                            
                                Problem loading the plyr package
                            
                                Data manipulation in R in LINQ style
                            
                                Identifying sequences of repeated numbers in R
                            
                                apply strsplit to specific column in a data.frame
                            
                                Subset data using non-sequential row numbers
                            
                                Length of lubridate interval
                            
                                Unable to install ggplot2 on Ubuntu 11.10
                            
                                Creating a latex table from ftable object in R
                            
                                control color in horizontal lines in ggplot2
                            
                                expanding factor interactions within a formula
                            
                                data.frame without ruining column names
                            
                                How to get length of current group in data.table grouping?
                            
                                How to reverse point size in ggplot?
                            
                                Getting values from kernel density estimation in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With