I know this issue has been raised in several places and I have been trying to find out a possible good solution for hours but failed. That's why I'm asking this.
So, I have a huge data file (~5GB) and I used fread()
to read this
library(data.table)
df<- fread('output.txt', sep = "|", stringsAsFactors = TRUE)
head(df, 5)
age income homeowner_status_desc marital_status_cd gender
1: $35,000 - $49,999
2: 35 - 44 $35,000 - $49,999 Rent Single F
3: $35,000 - $49,999
4:
5: $50,000 - $74,999
str(df)
Classes ‘data.table’ and 'data.frame': 999 obs. of 5 variables:
$ age : chr "" "35 - 44" "" "" ...
$ income : chr "$35,000 - $49,999" "$35,000 - $49,999" "$35,000 - $49,999" "" ...
$ homeowner_status_desc: chr "" "Rent" "" "" ...
$ marital_status_cd : chr "" "Single" "" "" ...
$ gender : chr "" "F" "" "" ...
- attr(*, ".internal.selfref")=<externalptr>
There are missing data(where it's blank). In the original data, there are lots of columns and thus I need to find a way to make columns Factor whenever columns include strings. Could anyone suggest what is the best practice to get this done? I was considering changing it to data frame and do this. But is it possible to do this while it's a data.table?
By using the apply() and sapply() functions, we were able to convert only the character columns to factor columns and leave all other columns unchanged.
The argument 'stringsAsFactors' is an argument to the 'data. frame()' function in R. It is a logical that indicates whether strings in a data frame should be treated as factor variables or as just plain strings. The argument also appears in 'read.
Converting DataFrame Column To Factor Column Similarly, a dataframe column can be converted to factor type, by referring to the particular data column using df$col-name command in R.
Sometimes a string is just a string. It is often claimed Sigmund Freud said “Sometimes a cigar is just a cigar.” To avoid problems delay re-encoding of strings by using stringsAsFactors = FALSE when creating data.
Just implemented stringsAsFactors
argument for fread
in v 1.9.6+
From NEWS:
- Implemented
stringsAsFactors
argument forfread()
. WhenTRUE
, character columns are converted to factors. Default isFALSE
. Thanks to Artem Klevtsov for filing #501, and to @hmi2015 for this SO post.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With