I have a list containing string items, but some of the string contains similar text, I am trying to get the distinct list.
My list contains this:
-Customers\\Order1
-Customers\\Order1\\Product1
-Customers\\Order2\\Product1
-Customers\\Order2\\Product1\\Price
From this list I need to get:
-Customers\\Order1\\Product1
-Customers\\Order2\\Product1\\Price
Basically I want to omit a string if it is in another string in the list?
You can do that with a bit of LINQ and a foreach
loop like:
List<string> outputList = new List<string>();
foreach (var str in originalList)
{
if (!outputList.Contains(str)
&& !originalList.Any(r => r!= str && r.Contains(str)))
{
outputList.Add(str);
}
}
Considering your originalList
is defined as:
List<string> originalList = new List<string>
{
"Customers\\Order1",
"Customers\\Order1\\Product1",
"Customers\\Order2\\Product1",
"Customers\\Order2\\Product1\\Price",
};
You will get the outputList
as:
Customers\\Order1\\Product1
Customers\\Order2\\Product1\\Price
If these values are truly paths and you want to handle subdirectories, you need to make sure you are also handling the case where a name is a substring of another name, but they are different paths. I.E. Customer\\Order1
and Customer\\Order10
.
public static class Extensions
{
public static IEnumerable<string> DistinctBySubString(this IEnumerable<string> strings)
{
var results = new List<string>();
foreach (var s in strings)
{
bool add = true;
for(int i=results.Count-1; i>=0; i--)
{
if (IsSubDirectoryOf(results[i],s))
{
results.RemoveAt(i);
}
else if (IsSubDirectoryOf(s,results[i]))
{
add = false;
}
}
if (add)
results.Add(s);
}
return results;
}
private static bool IsSubDirectoryOf(string dir1, string dir2)
{
DirectoryInfo di1 = new DirectoryInfo(dir1);
DirectoryInfo di2 = new DirectoryInfo(dir2);
bool isParent = false;
while (di2.Parent != null)
{
if (di2.Parent.FullName == di1.FullName)
{
isParent = true;
break;
}
else di2 = di2.Parent;
}
return isParent;
}
}
Using it like this:
List<string> strings = new List<string>()
{
"Customers\\Order1",
"Customers\\Order10",
"Customers\\Order1\\Product1",
"Customers\\Order2\\Product1",
"Customers\\Order2\\Product1\\Price"
};
foreach (var result in strings.DistinctBySubString())
{
Console.WriteLine(result);
}
The directory matching is based on the code from this answer: Given full path, check if path is subdirectory of some other path, or otherwise
The problem on the last routine is, you should abort the search in the second list, when there is a match. otherwise it will be still valid on other items.
EDIT: new routine:
class Program
{
private static IEnumerable<string> SelectUnique(IEnumerable<string> list)
{
// iterate the list..
foreach (var item1 in list)
// you don't want to match the same item.
if (!list.Where(item2 => item1 != item2)
// search for items where it start with the current item. (notice the ! before the list.Where)
.Any(item2 => item2.StartsWith(item1)))
yield return item1;
}
static void Main(string[] args)
{
List<string> list = new List<string>();
list.Add("Customers\\Order1\\Product1");
list.Add("Customers\\Order2\\Product1");
list.Add("Customers\\Order2\\Product1\\Price");
list.Add("Customers\\Order1");
list.Add("Customers\\Order3\\Price");
var results = SelectUnique(list);
foreach (var item in results)
Console.WriteLine(item);
Console.ReadKey();
}
}
I think this is best done as a LINQ query.
var input = new List<string>()
{
"Customers\\Order1",
"Customers\\Order1\\Product1",
"Customers\\Order2\\Product1",
"Customers\\Order2\\Product1\\Price",
};
var query =
from x in input
where !input.Any(y => y != x && y.Contains(x))
select x;
var result = query.ToList();
From which I get:
Just in case the actual requirement is to search by subpath and not by substring, then this works:
var input = new List<string>()
{
"Customers\\Order1",
"Customers\\Order1\\Product10",
"Customers\\Order1\\Product1",
"Customers\\Order2\\Product1",
"Customers\\Order2\\Product1\\Price",
};
var paths = input.ToDictionary(x => x, x => x.Split('\\'));
var query =
from x in input
where !input
.Any(y => y.Length > x.Length
&& paths[x]
.Zip(paths[y], (p1, p2) => new { p1, p2 })
.All(p => p.p1 == p.p2))
select x;
var result = query.ToList();
I get this result:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With