data.table::fread's stringsAsFactors=TRUE argument doesn't convert character columns to factor type - what's the workaround?

I know this issue has been raised in several places and I have been trying to find out a possible good solution for hours but failed. That's why I'm asking this.

So, I have a huge data file (~5GB) and I used fread() to read this

df<- fread('output.txt', sep = "|", stringsAsFactors = TRUE)
head(df, 5)
       age            income homeowner_status_desc marital_status_cd gender
1:         $35,000 - $49,999                                               
2: 35 - 44 $35,000 - $49,999                  Rent            Single      F
3:         $35,000 - $49,999                                               
5:         $50,000 - $74,999 
Classes ‘data.table’ and 'data.frame':  999 obs. of  5 variables:
 $ age                  : chr  "" "35 - 44" "" "" ...
 $ income               : chr  "$35,000 - $49,999" "$35,000 - $49,999" "$35,000 - $49,999" "" ...
 $ homeowner_status_desc: chr  "" "Rent" "" "" ...
 $ marital_status_cd    : chr  "" "Single" "" "" ...
 $ gender               : chr  "" "F" "" "" ...
 - attr(*, ".internal.selfref")=<externalptr> 

There are missing data(where it's blank). In the original data, there are lots of columns and thus I need to find a way to make columns Factor whenever columns include strings. Could anyone suggest what is the best practice to get this done? I was considering changing it to data frame and do this. But is it possible to do this while it's a data.table?

1 Answers

Just implemented stringsAsFactors argument for fread in v 1.9.6+

From NEWS:

  1. Implemented stringsAsFactors argument for fread(). When TRUE, character columns are converted to factors. Default is FALSE. Thanks to Artem Klevtsov for filing #501, and to @hmi2015 for this SO post.
