Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

F# and "enterprise-level" reporting [closed]

Based on your actual experience, a whitepaper or other respected referenceable study, is F# currently a viable tool for corporate-/enterprise-level reporting?

Attention: Before voting to close this question as "not constructive", please read the bit at the bottom.

Background
I currently work at a large corporation which makes heavy use of many different reporting tools, including (but hardly limited to) SAS, Cognos, SSRS and even a good smattering of COBOL. Each tool has its rightful place and many of them are, in most respects, equivalent in feature set, etc. Most of our tools are able to output to PDF, Excel and databases relatively easily and in those cases work wonderfully.

Unfortunately, my organization, like many, makes use of Excel spreadsheets and, love it or hate it, we spend many hours writing .NET console applications to extract information from and insert information into Excel spreadsheets. (I'm not interested in arguing the merits or detriments of this approach. It is what it is and there's no way I can change it.)

As great as the reporting technologies listed above are, they fall flat when it comes to advanced ETL from or into spreadsheets. They just weren't designed for it and while they are perfectly adept at formatting a report as an Excel spreadsheet, they aren't very good at updating an existing spreadsheet or extracting data in some very specific way (extract only values highlighted in red, for example). So we end up writing a LOT of .NET console applications to do this bit. (Again - not interested in debating the approach. It is what it is. I know - I don't like it either.)

.NET is, in my opinion, a fantastic framework and flexible enough to handle almost any programming task, so we could theoretically handle all of the reporting in .NET. But - trying to handle all of the reporting in .NET takes too long. We have to write all the boilerplate stuff ourselves. I like to leverage the power, simplicity and robustness of the actual reporting tools we already have.

So, we end up writing two applications for a single task - for example, a SAS job to load the data from multiple data sources, do the transformations and store the result in a permanent or temporary location, and a second .NET job to take the results and load them into the spreadsheet. (I know.)

The Point
I've been seeing and hearing a great deal about F# in the past couple years and I've dabbled in it a bit myself. I learned OCAML in college and I love functional programming. When called for, I'd love to do all the programming for a particular report on a single platform (if not a single language). The question, though, is whether the F# language and the .NET framework are fully ready for enterprise-level reporting - and I'm talking reports that must be run accurately and efficiently. Microsoft is certainly selling it hard, but I want to know if anyone with experience in other reporting technologies has actually tried it in a production environment. How does it compare with other reporting technologies and can it be easily integrated into a corporate environment? How did you address security? Done right, what kind of memory-profile does F# require (we're talking millions of records)? Does it process tabular data well? Is it efficient? How easy is it to maintain (especially if the code grows)? What kind of third-party add-ons, plug-ins, etc. are required to get something working (or can it do most everything out of the box)? How much work (programming hours, etc) is required compared to other reporting systems (for similar results)?

If you have no experience with F#, or if you use F# exclusively, then I'm not particularly interested in your opinion - I'd like to hear from those who have actually bridged the gap and can relate, from experience, the opportunities and pitfalls in using F# as a reporting engine for big data (millions of records, outputted to a variety of formats).

I've seen a few questions that already cover some of this ground:

  • Statistical functionalities of F# (or .NET libraries)
  • Your experiences with Matlab/F#/R for data analysis and modeling algorithms

But they are a few years old. Several versions later, is F# up to the task? Or am I a dog barking up the wrong tree?

EDIT

Just for clarity, I am particularly interested in F#'s new information-rich programming. Prior to F# 3.0, it was merely an interesting technology, but F#'s recently added capabilities to use database type providers and its query expressions make it look like a viable alternative to other report authoring technologies. Microsoft is certainly suggesting it is.

An acceptable answer would contain a first-hand account (or a reference to a documented case study) of implementing an enterprise-level reporting engine built in F# and a comparison to another reporting technology of any performance gains or losses, etc. It doesn't have to be too detailed - just enough to convince an average (competent) manager that F# would be an appropriate/inappropriate technology for bulk/batch data processing. Has it been done? Who did it? What were the results? How complicated was the implementation (relative to similar technologies)? Does it perform well?


Why am I asking a subjective question?
Like most good stackoverflow members, I frequently vote to close subjective questions. According to the FAQ, subjective questions should be avoided but are not banned entirely. The FAQ links to six guidelines for great subjective questions which I have tried to follow. Please read those guidelines before voting to close this question.

like image 716
JDB Avatar asked Jan 31 '13 15:01

JDB


2 Answers

How does it compare with other reporting technologies and can it be easily integrated into a corporate environment?

I don't know how F# compares with other reporting technologies but I have deployed it in more than one corporate environment and it is basically the same as C#, i.e. easy and robust.

How did you address security?

Same as C#.

Done right, what kind of memory-profile does F# require (we're talking millions of records)?

I've found one GC bug in .NET in 5 years of use and it was not specific to F#. I've had several problems with large objects (again, not F# specific) but, in general, the GC is robust and efficient and collects aggressively.

I've processed billions of records and found F# to be extremely fast and very reliable. Note that F# is used in Microsoft's Bing AdCenter (for ad placement) and Microsoft's Halo 3, both of which require terabyte datasets to be processed.

Does it process tabular data well?

Yes and you have easy parallelism (see the Array.Parallel module) but its main strength relative to other tools is in manipulating structured data like trees and graphs.

Is it efficient?

Yes.

Our current client, one of the world's largest insurance companies, saw a 10x performance improvement switching from C++ to F# (as well as a 10x reduction in code size).

A previous client saw a performance improvement moving a compiler from OCaml to F#. This is impressive because OCaml was specifically designed for writing compilers and is extremely fast.

A former client had us rewrite their trading platform and we saw 100x throughput and latency improvements even though we were moving from non-GC C++ to GC'd F#.

How easy is it to maintain (especially if the code grows)?

Easy to maintain. In ML, adding functions is a no-brainer and the static type system catches gives you lots of feedback when you extend union types.

Our current client put their first F# code live last April and its maintainer had no problems despite not having had any training in F# (or OCaml) at all.

What kind of third-party add-ons, plug-ins, etc. are required to get something working (or can it do most everything out of the box)?

We have never used any (but we sell two!). The only third party things I've considered are WPF controls which are, again, not F# specific.

How much work (programming hours, etc) is required compared to other reporting systems (for similar results)?

No idea, sorry. Looks like we've got some work with Dialogue and HP Extreme coming up so I'll find out soon enough...

How complicated was the implementation (relative to similar technologies)?

F# code is much simpler than older mainstream languages like C++, C# and Java.

I'd like to stress that F# really pays dividends when you use it to attack problems that are too complicated to solve using more traditional tools, rather than just rewriting old code in F#.

For example, our current client have been using a business rules engine that cost them around £1,000,000 to buy but it doesn't solve their business problem (struggles with big tables, struggles with mathematics) so I wrote them a demo of a bespoke business rules engine in one week in around 1,000 lines of F# code. I could not have done that with any other tool.

like image 93
J D Avatar answered Oct 02 '22 16:10

J D


To answer your question – you’re on the right track. I say this as someone who has built a number of reporting and big data systems. I built one of the Big Data Analytics platforms used at eBay in Scala and R. More recently I built the Hadoop / Hive F# Type Provider for MSRC. I can say that nothing comes close to the F# .net stack for this purpose. Great performance, easy to use native interop, lots of libraries, REPL, Type Providers, WPF for charting. Since MSRC I have been building a fully featured F# IDE that can be embedded into Excel where you can use a Type Provider to interact with the workbook complete with Intelisense. Email me if you’d like to see it.

Edit;

Sure; I replaced one of my customers Infobright database with F# using in-memory data and a from scratch query engine. It reduced query time on 10s of GBs of data from 30 minutes to 100s of milliseconds. The whole thing took me 6 hours to build and was only a few hundred lines of code. The database was the backend to a web-based reporting service which became immensely more responsive after the upgrade.

While at eBay I used to do my Big Data (bulk/batch) post processing in R. The basic flat files were 10s of GBs so they were way too big for Excel. R did a huge amount of unnecessary memory allocation during the aggregation passes; 10GB would become 40GB and would crawl to a halt once it started hitting the pagefile. Depending on the data it would take minutes, hours or never finish. There are paid R libraries that fix this but they are limiting in other ways. Doing the aggregations in F# brought this down to 100s of milliseconds with constant space. These aggregations were 10s of lines of code, about the same as R but much easier to understand and were type checked. Having an R job fail after an hour of processing because of a typo is infuriating.

I used to use OLAP cubes (e.g. Microsoft Analysis Services), but these systems have been entirely eclipsed by Big Data clusters and Big Memory machines. Now it is easy to build your own Big Memory machine with F# and the new Garbage Collector in .net 4.5.

Hope that helps.

like image 44
moloneymb Avatar answered Oct 02 '22 17:10

moloneymb