Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do you filter a list so that no member is a substring of another member

Tags:

c#

linq

distinct

I have a list containing string items, but some of the string contains similar text, I am trying to get the distinct list.

My list contains this:

-Customers\\Order1
-Customers\\Order1\\Product1
-Customers\\Order2\\Product1
-Customers\\Order2\\Product1\\Price

From this list I need to get:

-Customers\\Order1\\Product1
-Customers\\Order2\\Product1\\Price

Basically I want to omit a string if it is in another string in the list?

like image 252
NHead Avatar asked Feb 12 '15 19:02

NHead


4 Answers

You can do that with a bit of LINQ and a foreach loop like:

List<string> outputList = new List<string>();
foreach (var str in originalList)
{
    if (!outputList.Contains(str)
        && !originalList.Any(r => r!= str && r.Contains(str)))
    {
        outputList.Add(str);
    }
}

Considering your originalList is defined as:

List<string> originalList = new List<string>
{
    "Customers\\Order1",
    "Customers\\Order1\\Product1",
    "Customers\\Order2\\Product1",
    "Customers\\Order2\\Product1\\Price",
};

You will get the outputList as:

Customers\\Order1\\Product1
Customers\\Order2\\Product1\\Price
like image 74
Habib Avatar answered Sep 21 '22 15:09

Habib


If these values are truly paths and you want to handle subdirectories, you need to make sure you are also handling the case where a name is a substring of another name, but they are different paths. I.E. Customer\\Order1 and Customer\\Order10.

public static class Extensions
{
    public static IEnumerable<string> DistinctBySubString(this IEnumerable<string> strings)
    {
        var results = new List<string>();
        foreach (var s in strings)
        {
            bool add = true;
            for(int i=results.Count-1; i>=0; i--)
            {
                if (IsSubDirectoryOf(results[i],s))
                {
                    results.RemoveAt(i);
                }
                else if (IsSubDirectoryOf(s,results[i]))
                {
                    add = false;
                }

            }
            if (add)
                results.Add(s);
        }
        return results;
    }

    private static bool IsSubDirectoryOf(string dir1, string dir2)
    {
        DirectoryInfo di1 = new DirectoryInfo(dir1);
        DirectoryInfo di2 = new DirectoryInfo(dir2);
        bool isParent = false;
        while (di2.Parent != null)
        {
            if (di2.Parent.FullName == di1.FullName)
            {
                isParent = true;
                break;
            }
            else di2 = di2.Parent;
        }
        return isParent;
    }
}

Using it like this:

List<string> strings = new List<string>()
{
    "Customers\\Order1",
    "Customers\\Order10",
    "Customers\\Order1\\Product1",
    "Customers\\Order2\\Product1",
    "Customers\\Order2\\Product1\\Price"
};



foreach (var result in strings.DistinctBySubString())
{
    Console.WriteLine(result);
}

The directory matching is based on the code from this answer: Given full path, check if path is subdirectory of some other path, or otherwise

like image 27
John Koerner Avatar answered Sep 24 '22 15:09

John Koerner


The problem on the last routine is, you should abort the search in the second list, when there is a match. otherwise it will be still valid on other items.

EDIT: new routine:

class Program
{
    private static IEnumerable<string> SelectUnique(IEnumerable<string> list)
    {
        // iterate the list..
        foreach (var item1 in list)
            // you don't want to match the same item.
            if (!list.Where(item2 => item1 != item2)
                // search for items where it start with the current item. (notice the ! before the list.Where)
                .Any(item2 => item2.StartsWith(item1)))
                    yield return item1;
    }


    static void Main(string[] args)
    {

        List<string> list = new List<string>();
        list.Add("Customers\\Order1\\Product1");
        list.Add("Customers\\Order2\\Product1");
        list.Add("Customers\\Order2\\Product1\\Price");
        list.Add("Customers\\Order1");
        list.Add("Customers\\Order3\\Price");



        var results = SelectUnique(list);

        foreach (var item in results)
            Console.WriteLine(item);

        Console.ReadKey();

    }
}
like image 39
Jeroen van Langen Avatar answered Sep 24 '22 15:09

Jeroen van Langen


I think this is best done as a LINQ query.

var input = new List<string>()
{
    "Customers\\Order1",
    "Customers\\Order1\\Product1",
    "Customers\\Order2\\Product1",
    "Customers\\Order2\\Product1\\Price",
};

var query =
    from x in input
    where !input.Any(y => y != x && y.Contains(x))
    select x;

var result = query.ToList();

From which I get:

result


Just in case the actual requirement is to search by subpath and not by substring, then this works:

var input = new List<string>()
{
    "Customers\\Order1",
    "Customers\\Order1\\Product10",
    "Customers\\Order1\\Product1",
    "Customers\\Order2\\Product1",
    "Customers\\Order2\\Product1\\Price",
};

var paths = input.ToDictionary(x => x, x => x.Split('\\'));

var query =
    from x in input
    where !input
        .Any(y => y.Length > x.Length
            && paths[x]
                .Zip(paths[y], (p1, p2) => new { p1, p2 })
                .All(p => p.p1 == p.p2))
    select x;

var result = query.ToList();

I get this result:

result2

like image 45
Enigmativity Avatar answered Sep 21 '22 15:09

Enigmativity