My question is regarding BigData in .NET. BigData is used to store and query huge amounts of data (Facebook, Google, Twitter, ...). Examples of BigData are MapReduce, Hadoop, Dryad, etc.
Microsoft dropped their Dryad (DryadLinq) alternative in favor of Hadoop (Dryad and the article), so I'd like to prepare myself for it and everything that has to do with it.
What is available now?
Hadoop Connector
SQL Server 2012 RC (don't use in production :))
Microsoft Information on Big Data
What should I know more about releases and development?
Register on the TechPreview
Question 1: What should I know about Hadoop that isn't unique to the .NET platform? (how to query, specific patterns, architecture, ...) and will be usefull (in a .NET environment)
Question 2: Is there more information on the Hadoop in the .NET platform, than I already know?
it's a vague question so here's a vague answer :)
Hadoop on its own is a tool to run map-reduce jobs in a cluster, it's highly optimized for performance and a good deal of this optimization is done by distributing the data in a way that makes it easy to consume without incurring on I/O penalties.
for this you should read about HDFS and the internals that explain how is this done, in a nutshell what happens is that the input data is clumped together in nodes to run the processes locally and read sequentially (this is a property/limitation of HDFS).
this way you input your "BigData" and it gets split and processed in the most efficient way inside the cluster.
now that' all there is to Hadoop itself, there's tools that work on top of it that allow you to perform high-level abstractions on the data (map-reduce is among the simplest procedures).
those include:
Specifics for .Net
For Hadoop on Azure (.Net) , there's an introduction on msdn here with more info here. Related to building Hadoop applications through their platform. It's only CTP for now, but off course this will change.
Here's another good blogpost about Hadoop and MapReduce with code
Additionally, there's also a company that frequently gives information about Hadoop: Cloudera, you should check there frequently for more information. For more information, check the cloudera page linked above and you can view all the concepts about Hadoop (it's pretty advanced though)
I'm pretty sure this isn't what you were looking for but I've no idea what you want so at least I hope you can check a few new projects that may help.
also check Storm: https://github.com/nathanmarz/storm it's not related to Hadoop but works on realtime scenarios which Hadoop is not suited for.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With