Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

F# Type Providers and Data Processing

In a previous question (Working with heterogenous data in a statically typed language), I asked about how F# handles standard tasks in data analysis, such as manipulating an untyped CSV file. Dynamic langauges excel at basic tasks like

data = load('income.csv')
data.log_income = log(income)

In F#, the most elegant approach seems to be the question mark (?) operator. Unfortunately in the process we lose static typing and still need type annotations here and there.

One of the most exciting future feature of F# are Type Providers. With minimal loss of type safety, a CSV type provider could provide types by dynamically examining the file.

But data analysis typically doesn't stop there. We often transform the data and create new datasets, via a pipeline of operations. My question is, can Type Providers help if we mostly manipulate data? For instance:

open CSV // Type provider
let data = CSV(file='income.csv') // Type provider magic (syntax?)
let log_income = log(data.income) // works!

This works but pollutes the global namespace. It is often more natural to think about adding a column, rather than creating a new variable. Is there some way to do?

let data.logIncome = log(data.income) // won't work, sadly.

Do type providers provide an escape from using the (?) operator when the goal is creating new derivative or cleaned-up datasets?

Perhaps something like:

let newdata = colBind data {logIncome = log(data.income)}  // ugly, does it work?

Other ideas?

like image 242
Tristan Avatar asked Dec 21 '10 21:12

Tristan


1 Answers

The short answer is no, the long answer is yes (but you wouldn't like the result). The key thing to remember is that F# is a statically typed language, full stop.

For the code that you provided, what type does newData have? If it cannot be pinned down at compile-time, then you need to resort to casting to/from Obj.

// newdata MUST have a static type, even if obj
let newdata = colBind data {logIncome = log(data.income)}  

Imagine colBind has the following sinature:

val colBind: Thingey<'a> -> 'b -> Thingey2<'a, 'b>

That would actually work for a ways, but it wouldn't work universally. Because eventually you would need a type that would not exist at compile time.

F# type providers allow you to statically type data originating from outside of the standard compile-time environment. However, the types are still static. There is no way to alter those types dynamically at runtime*.

*You can modify the object at runtime using shenanigans such as DynamicObject. However, once you start going down that path you lose all the benefits of a statically typed language such as Intellisense. (Which is a major reason for using F# in the first place.)

Conceptually, what you want to do is straight forward. The System.Data.DataTable type already has the notion of storing tabular data with the ability to dynamically add columns. But since the type information for added columns is not known at compile time it follows that things stored in those columns must be treated as Obj and casted at runtime.

like image 183
Chris Smith Avatar answered Sep 19 '22 15:09

Chris Smith