How to implement MapReduce in C# using PLINQ?
Suppose, you have 7-8 WebServices to collect data and on each receiving (async manner) you have to put that data into some tables of a database, in my case it is SQL Server 2008. For instance the data you are getting from each Web Service is:
<employees>
<employee>
<name>Ramiz</name>
</employee>
<employee>
<name>Aamir</name>
</employee>
<employee>
<name>Zubair</name>
</employee>
</employees>
And, on each receiving of response this data goes into a table name - Employee:
Employee
===
EmployeeID (PK)
EmployeeName
Once the data goes into table, it has to return as json
to the client which is ASP.NET (MVC 3) application is making this call using client-side JavaScript (ajax).
Suppose, a WebServiceEmployee1 has returned with data and other 6 are in queue (still trying to get the data). Then, it should goes register the resultset into table instead of waiting other 6 and return data of inserted employee to client in json. And, keep it connected and doing while others do the same way.
Please see, in my toolbelt I have ASP.NET MVC 3 (Razor), SQL SERVER 2008 R2, jQuery.
Thanks.
MapReduce Algorithm During a MapReduce job, Hadoop sends Map and Reduce tasks to appropriate servers in the cluster. The framework manages all the details of data-passing like issuing tasks, verifying task completion, and copying data around the cluster between the nodes.
A map operation applies a function to each value in a sequence. A reduce operation combines the elements of a sequence into one value. You can use the C++ Standard Library std::transform and std::accumulate functions to perform map and reduce operations.
The MapReduce framework in Hadoop has native support for running Java applications. It also supports running non-Java applications in Ruby, Python, C++ and a few other programming languages, via two frameworks, namely the Streaming framework and the Pipes framework.
MapReduce is a processing technique and a program model for distributed computing based on java. The MapReduce algorithm contains two important tasks, namely Map and Reduce. Map takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs).
Without clear understanding on the processing you want to perform, I suggest reading though the following MSDN documentation: http://bit.ly/Ir7Nvk It describes map/reduce with PLINQ with examples like:
public IDMultisetItemList PotentialFriendsPLinq(SubscriberID id,
int maxCandidates)
{
var candidates =
subscribers[id].Friends.AsParallel()
.SelectMany(friend => subscribers[friend].Friends)
.Where(foaf => foaf != id &&
!(subscribers[id].Friends.Contains(foaf)))
.GroupBy(foaf => foaf)
.Select(foafGroup => new IDMultisetItem(foafGroup.Key,
foafGroup.Count()));
return Multiset.MostNumerous(candidates, maxCandidates);
}
The "map" being Friends.AsParallel
, SelectMany
, and Where
and the "reduce" phase is the GroupBy
and Select
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With