Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting string into words length-based lists c#

Tags:

string

c#

list

I have a string of words separated by spaces. How to split the string into lists of words based on the words length?

Example

input:

" aa aaa aaaa bb bbb bbbb cc ccc cccc cccc bbb bb aa "

output :

List 1 = { aa, bb, cc}
List 2 = { aaa, bbb, ccc}
List 3 = { aaaa, bbbb, cccc}
like image 878
FSm Avatar asked Jul 07 '12 20:07

FSm


2 Answers

You can use Where to find elements that match a predicate (in this case, having the correct length):

string[] words = input.Split();

List<string> twos = words.Where(s => s.Length == 2).ToList();
List<string> threes = words.Where(s => s.Length == 3).ToList();
List<string> fours = words.Where(s => s.Length == 4).ToList();

Alternatively you could use GroupBy to find all the groups at once:

var groups = words.GroupBy(s => s.Length);

You can also use ToLookup so that you can easily index to find all the words of a specific length:

var lookup = words.ToLookup(s => s.Length);
foreach (var word in lookup[3])
{
    Console.WriteLine(word);
}

Result:

aaa
bbb
ccc

See it working online: ideone


In your update it looks like you want to remove the empty strings and duplicated words. You can do the former by using StringSplitOptions.RemoveEmptyEntries and the latter by using Distinct.

var words = input.Split((char[])null, StringSplitOptions.RemoveEmptyEntries)
                 .Distinct();
var lookup = words.ToLookup(s => s.Length);

Output:

aa, bb, cc
aaa, bbb, ccc
aaaa, bbbb, cccc

See it working online: ideone

like image 141
Mark Byers Avatar answered Sep 26 '22 01:09

Mark Byers


Edit: I'm glad my original answer helped the OP solve their problem. However, after pondering the problem a bit, I've adapted it (and I strongly advise against my former solution, which I have left at the end of the post).

A simple approach

string input = " aa aaa aaaa bb bbb bbbb cc ccc cccc cccc bbb bb aa ";
var words = input.Trim().Split().Distinct();
var lookup = words.ToLookup(word => word.Length);

Explanation

First, we trim the input to avoid empty elements from the outer spaces. Then, we split the string into an array. If multiple spaces occur in between the words, you'd need to use StringSplitOptions as as in Mark's answer.

After calling Distinct to only include each word once, we now convert words from IEnumerable<string> to Lookup<int, string>, where the words' length is represented by the key (int) and the words themselves are stored in the value (string).

Hang on, how is that even possible? Don't we have multiple words for each key? Sure, but that's exactly what the Lookup class is there for:

Lookup<TKey, TElement> represents a collection of keys each mapped to one or more values. A Lookup<TKey, TElement> resembles a Dictionary<TKey, TValue>. The difference is that a Dictionary maps keys to single values, whereas a Lookup maps keys to collections of values.

You can create an instance of a Lookup by calling ToLookup on an object that implements IEnumerable<T>.


Note
There is no public constructor to create a new instance of a Lookup. Additionally, Lookup objects are immutable, that is, you cannot add or remove elements or keys from a Lookup after it has been created.

word => word.Length is the KeySelector lambda: it defines that we want to index (or group, if you will) the Lookup by the Length of the words.

Usage

Write all the words to the console

(similar to the question's originally requested output)

foreach (var grouping in lookup)
{
    Console.WriteLine("{0}: {1}", grouping.Key, string.Join(", ", grouping));
}

Output

2: aa, bb, cc
3: aaa, bbb, ccc
4: aaaa, bbbb, cccc

Put all words of a certain length in a List

List<String> list3 = lookup[3].ToList();

Order by key

(note that these will return IOrderedEnumerable<T>, so access by key is no longer possible)

var orderedAscending = lookup.OrderBy(grouping => grouping.Key);
var orderedDescending = lookup.OrderByDescending(grouping => grouping.Key);

Original answer - please don't do this (bad performance, code clutter):

string input = " aa aaa aaaa bb bbb bbbb cc ccc cccc cccc bbb bb aa ";
Dictionary<int, string[]> results = new Dictionary<int, string[]>();
var grouped = input.Trim().Split().Distinct().GroupBy(s => s.Length)
    .OrderBy(g => g.Key); // or: OrderByDescending(g => g.Key);
foreach (var grouping in grouped)
{
    results.Add(grouping.Key, grouping.ToArray());
}
like image 36
Adam Avatar answered Sep 22 '22 01:09

Adam