Given the following setup in TPL dataflow. <pre class="prettyprint"><code>var directory = new DirectoryInfo(@"C:\dev\kortforsyningen_dsm\tiles"); var dirBroadcast=new BroadcastBlock<DirectoryInfo>(dir=>dir); var dirfinder = new TransformManyBlock<DirectoryInfo, DirectoryInfo>((dir) => { return directory.GetDirectories(); }); var tileFilder = new TransformManyBlock<DirectoryInfo, FileInfo>((dir) => { return directory.GetFiles(); }); dirBroadcast.LinkTo(dirfinder); dirBroadcast.LinkTo(tileFilder); dirfinder.LinkTo(dirBroadcast); var block = new XYZTileCombinerBlock<FileInfo>(3, (file) => { var coordinate = file.FullName.Split('\\').Reverse().Take(3).Reverse().Select(s => int.Parse(Path.GetFileNameWithoutExtension(s))).ToArray(); return XYZTileCombinerBlock<CloudBlockBlob>.TileXYToQuadKey(coordinate[0], coordinate[1], coordinate[2]); }, (quad) => XYZTileCombinerBlock<FileInfo>.QuadKeyToTileXY(quad, (z, x, y) => new FileInfo(Path.Combine(directory.FullName,string.Format("{0}/{1}/{2}.png", z, x, y)))), () => new TransformBlock<string, string>((s) => { Trace.TraceInformation("Combining {0}", s); return s; })); tileFilder.LinkTo(block); using (new TraceTimer("Time")) { dirBroadcast.Post(directory); block.LinkTo(new ActionBlock<FileInfo>((s) => { Trace.TraceInformation("Done combining : {0}", s.Name); })); block.Complete(); block.Completion.Wait(); } </code></pre> i am wondering how I can mark this to complete because of the cycle. A directory is posted to the dirBroadcast broadcaster which posts to the dirfinder that might post back new dirs to the broadcaster, so i cant simply mark it as complete because it would block any directories being added from the dirfinder. Should i redesign it to keep track of the number of dirs or is there anything for this in TPL.

If the purpose of your code is to traverse the directory structure using some sort of parallelism then I would suggest not using TPL Dataflow and use Microsoft's Reactive Framework instead. I think it becomes much simpler. Here's how I would do it. First define a recursive function to build the list of directories: <pre class="prettyprint"><code>Func<DirectoryInfo, IObservable<DirectoryInfo>> recurse = null; recurse = di => Observable .Return(di) .Concat(di.GetDirectories() .ToObservable() .SelectMany(di2 => recurse(di2))) .ObserveOn(Scheduler.Default); </code></pre> This performs the recurse of the directories and uses the default Rx scheduler which causes the observable to run in parallel. So by calling <code>recurse</code> with an input <code>DirectoryInfo</code> I get an observable list of the input directory and all of its descendants. Now I can build a fairly straight-forward query to get the results I want: <pre class="prettyprint"><code>var query = from di in recurse(new DirectoryInfo(@"C:\dev\kortforsyningen_dsm\tiles")) from fi in di.GetFiles().ToObservable() let zxy = fi .FullName .Split('\\') .Reverse() .Take(3) .Reverse() .Select(s => int.Parse(Path.GetFileNameWithoutExtension(s))) .ToArray() let suffix = String.Format("{0}/{1}/{2}.png", zxy[0], zxy[1], zxy[2]) select new FileInfo(Path.Combine(di.FullName, suffix)); </code></pre> Now I can action the query like this: <pre class="prettyprint"><code>query .Subscribe(s => { Trace.TraceInformation("Done combining : {0}", s.Name); }); </code></pre> Now I may have missed a little bit in your custom code but if this is an approach you want to take I'm sure you can fix any logical issues quite easily. This code automatically handles completion when it runs out of child directories and files. To add Rx to your project look for "Rx-Main" in NuGet.

How to mark a TPL dataflow cycle to complete?

Tags:

c#

.net

task-parallel-library

tpl-dataflow

Given the following setup in TPL dataflow.

var directory = new DirectoryInfo(@"C:\dev\kortforsyningen_dsm\tiles");

var dirBroadcast=new BroadcastBlock<DirectoryInfo>(dir=>dir);

var dirfinder = new TransformManyBlock<DirectoryInfo, DirectoryInfo>((dir) =>
{
    return directory.GetDirectories();

});
var tileFilder = new TransformManyBlock<DirectoryInfo, FileInfo>((dir) =>
{
    return directory.GetFiles();
});
dirBroadcast.LinkTo(dirfinder);
dirBroadcast.LinkTo(tileFilder);
dirfinder.LinkTo(dirBroadcast);

var block = new XYZTileCombinerBlock<FileInfo>(3, (file) =>
{
    var coordinate = file.FullName.Split('\\').Reverse().Take(3).Reverse().Select(s => int.Parse(Path.GetFileNameWithoutExtension(s))).ToArray();
    return XYZTileCombinerBlock<CloudBlockBlob>.TileXYToQuadKey(coordinate[0], coordinate[1], coordinate[2]);
},
(quad) =>
    XYZTileCombinerBlock<FileInfo>.QuadKeyToTileXY(quad,
        (z, x, y) => new FileInfo(Path.Combine(directory.FullName,string.Format("{0}/{1}/{2}.png", z, x, y)))),
    () => new TransformBlock<string, string>((s) =>
    {
        Trace.TraceInformation("Combining {0}", s);
        return s;
    }));

tileFilder.LinkTo(block);


using (new TraceTimer("Time"))
{
    dirBroadcast.Post(directory);

    block.LinkTo(new ActionBlock<FileInfo>((s) =>
    {
        Trace.TraceInformation("Done combining : {0}", s.Name);

    }));
    block.Complete();
    block.Completion.Wait();

}

i am wondering how I can mark this to complete because of the cycle. A directory is posted to the dirBroadcast broadcaster which posts to the dirfinder that might post back new dirs to the broadcaster, so i cant simply mark it as complete because it would block any directories being added from the dirfinder. Should i redesign it to keep track of the number of dirs or is there anything for this in TPL.

377

asked Sep 30 '14 21:09

Poul K. Sørensen

1 Answers

If the purpose of your code is to traverse the directory structure using some sort of parallelism then I would suggest not using TPL Dataflow and use Microsoft's Reactive Framework instead. I think it becomes much simpler.

Here's how I would do it.

First define a recursive function to build the list of directories:

Func<DirectoryInfo, IObservable<DirectoryInfo>> recurse = null;
recurse = di =>
    Observable
        .Return(di)
        .Concat(di.GetDirectories()
            .ToObservable()
            .SelectMany(di2 => recurse(di2)))
        .ObserveOn(Scheduler.Default);

This performs the recurse of the directories and uses the default Rx scheduler which causes the observable to run in parallel.

So by calling recurse with an input DirectoryInfo I get an observable list of the input directory and all of its descendants.

Now I can build a fairly straight-forward query to get the results I want:

var query =
    from di in recurse(new DirectoryInfo(@"C:\dev\kortforsyningen_dsm\tiles"))
    from fi in di.GetFiles().ToObservable()
    let zxy =
        fi
            .FullName
            .Split('\\')
            .Reverse()
            .Take(3)
            .Reverse()
            .Select(s => int.Parse(Path.GetFileNameWithoutExtension(s)))
            .ToArray()
    let suffix = String.Format("{0}/{1}/{2}.png", zxy[0], zxy[1], zxy[2])
    select new FileInfo(Path.Combine(di.FullName, suffix));

Now I can action the query like this:

query
    .Subscribe(s =>
    {
        Trace.TraceInformation("Done combining : {0}", s.Name);
    });

Now I may have missed a little bit in your custom code but if this is an approach you want to take I'm sure you can fix any logical issues quite easily.

This code automatically handles completion when it runs out of child directories and files.

To add Rx to your project look for "Rx-Main" in NuGet.

122

answered Sep 22 '22 10:09

Enigmativity

Related questions
                            
                                Linq select Item where it is equal to ID in another table
                            
                                Tamir.SharpSsh Could not load file or assembly 'DiffieHellman
                            
                                DependencyResolver + Owin + WebApi2
                            
                                How to cast a Newton.Json deserialized generic object to a custom object? [closed]
                            
                                WinDbg .foreach by reference type and get field value
                            
                                String.Contains and String.LastIndexOf C# return different result?
                            
                                Why best practices vary for Static classes in OOP?
                            
                                Where does string overload + operator for string concatenation?
                            
                                Add UIBarButtonItem to navigation Bar in Xamarin IOS
                            
                                Linq in For Loop [duplicate]
                            
                                Unexpected node type Element
                            
                                Windows Phone 8.1 and CurrentAppSimulator
                            
                                How can I avoid a "Bad Request - Invalid Hostname" error when making a REST call from a Compact Framework client?
                            
                                How to convert an event to an IObservable when it doesn't conform to the standard .NET event pattern
                            
                                Cast all keys in dictionary to uppercase
                            
                                Converting JsonResult into a different object in C#
                            
                                C# References; Keeping Members Hidden
                            
                                Hiding subtotals in pivot table in epplus
                            
                                Add object to a BindingList in a BindingList
                            
                                WPF MVVM - Update Dropdown When Clicked

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With