Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I efficiently remove elements by index from a very large list?

I have a very large list of integers (about 2 billion elements) and a list with indices (couple thousand elements) at which I need to remove elements from the first list. My current approach is to loop over all indices in the second list, passing each to the RemoveAt() method of the first list:

indices.Sort(); indices.Reverse(); for (i = 0; i < indices.Count; i++) {     largeList.RemoveAt(indices[i]); } 

However, it takes about 2 minutes to finish. I really need to perform this operation much faster. Is there any way of optimizing this?

I have a Intel i9X CPU with 10 cores, so maybe some way of parallel processing?

like image 350
N K Avatar asked Aug 19 '20 21:08

N K


People also ask

Which method is used to remove list elements using index?

You can use the pop() method to remove specific elements of a list. pop() method takes the index value as a parameter and removes the element at the specified index. Therefore, a[2] contains 3 and pop() removes and returns the same as output.

Which method is used to removes all the elements from the list?

Clear Method is used remove the all the elements from the List.

How do you remove an element from a list in Python effectively?

Pop. The pop method removes the list element at a certain index and returns the value of the element once it's removed. This is useful if you want to remove an element from a list, but still want to use that element's value for other purposes.


2 Answers

Each time RemoveAt() is called it has to move every element that comes after the specified index up by one element. If there are thousands of elements to remove in a very large list, that will result in many, many (needless) moves.

My thinking was that if you can calculate the start index and length of each move, you can process the list in a single pass with no overlapping moves. This is done by comparing adjacent elements in the list of indices to remove. While this does mean building a third List<> of move operations to perform, my hope is having the most efficient, minimal move strategy planned out in advance will pay off in the end (or perhaps there's a way to do that without allocating any objects that presently isn't occurring to me).

You can see my implementation, RemoveUsingMoveStrategy(), in the benchmark code below. Running the launcher program below in test mode, we can see it — as well as the other answers — produces the correct result when given indices 0, 1, 5, 10, 11, 15, 15 (repeated), 18, and 19 to remove from a 20-element List<int>...

PS> dotnet run --configuration release --framework netcoreapp3.1 -- test [snip] RemoveUsingMoveStrategy():         Elements: { 2, 3, 4, 6, 7, 8, 9, 12, 13, 14, 16, 17 }            Count: 12         Sequence: Equal to control list         Duration: 3.95 milliseconds         Returned: Input list [snip]

Benchmark notes

  • It was my intention to benchmark against a billion-element List<int>, but RemoveUsingRemoveAt() — inspired by the code in the question — is so inefficient that it was taking too long, so I only went up to 10 million elements.

  • I still hope to run a billion-element benchmark that excludes RemoveUsingRemoveAt(). For that reason, I introduced a not-quite-as-naive implementation called RemoveUsingListCopy() to serve as the baseline for comparison across all list sizes. As the name implies, it doesn't modify the input list but creates a new one with the removals applied.

  • Since not all implementations require the list of indices to remove to be unique and/or sorted, I figured the fairest way to benchmark would be to include that overhead in the benchmark time where the method requires it.

  • The only other changes I made to code from other answers was to make use of the benchmark's lists (DataList and RemovalList) and to change from var to explicit types for reader clarity.

  • RemovalListLocation indicates from where in DataList indices were removed.

    • For Beginning, Middle, and End it is a contiguous RemovalListSize-sized block removed from that location.
    • For Random it is RemovalListSize random, valid, unsorted, not-guaranteed-to-be-unique indices generated from a constant seed.

    To keep the results short(er), I opted to only benchmark the Middle — thinking it'd be a good middle-of-the-road compromise — and Random values.

Benchmark results

  • RemoveUsingRemoveAt() is terrible. Don't do that.

  • For the smaller list, RemoveUsingListCopy() is consistently the fastest, at the expense of doubling memory usage.

  • For the larger list, Vernou's answer is as least 5x as fast as the other answers, at the expense of relying on implementation details.

    This just goes to show that you shouldn't necessarily always favor List<> over an array — unless you need its additional capabilities — because it isn't always superior. It insulates the underlying array, preventing you (without reflection) from using more performant access methods like Array.Copy() and unsafe code on it.

  • Theodor Zoulias's answer does well with the smaller list, but I think having to "touch" all 10 million indices — each with a call to HashSet<>.Contains() and increment of an index variable — really hurt it with the larger list.

  • The remaining implementations (including mine) have roughly the same performance: not the best, but still pretty good.

Benchmark data

Running the launcher program defined later in this answer in benchmark mode, I get these results from BenchmarkDotNet...

PS> dotnet run --configuration release --framework netcoreapp3.1 -- benchmark [snip] // * Summary *  BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.450 (2004/?/20H1) Intel Core i7 CPU 860 2.80GHz (Nehalem), 1 CPU, 8 logical and 4 physical cores .NET Core SDK=3.1.401   [Host]    : .NET Core 3.1.7 (CoreCLR 4.700.20.36602, CoreFX 4.700.20.37001), X64 RyuJIT   MediumRun : .NET Framework 4.8 (4.8.4200.0), X64 RyuJIT  Job=MediumRun  InvocationCount=1  IterationCount=15   LaunchCount=2  UnrollFactor=1  WarmupCount=10    |                        Method |       Runtime | DataListSize | RemovalListSize | RemovalListLocation |           Mean |         Error |        StdDev |         Median | Ratio | RatioSD | |------------------------------ |-------------- |------------- |---------------- |-------------------- |---------------:|--------------:|--------------:|---------------:|------:|--------:| |           RemoveUsingListCopy |      .NET 4.8 |        10000 |            2000 |              Random |       379.9 μs |      15.93 μs |      23.36 μs |       380.4 μs |  1.00 |    0.00 | |           RemoveUsingRemoveAt |      .NET 4.8 |        10000 |            2000 |              Random |     1,463.7 μs |      93.79 μs |     137.47 μs |     1,434.7 μs |  3.87 |    0.41 | |       RemoveUsingMoveStrategy |      .NET 4.8 |        10000 |            2000 |              Random |       635.9 μs |      19.77 μs |      29.60 μs |       624.0 μs |  1.67 |    0.14 | | Answer63496225_TheodorZoulias |      .NET 4.8 |        10000 |            2000 |              Random |       372.1 μs |      11.52 μs |      16.15 μs |       373.8 μs |  0.99 |    0.07 | |       Answer63496768_CoolBots |      .NET 4.8 |        10000 |            2000 |              Random |       594.5 μs |      25.13 μs |      36.03 μs |       593.1 μs |  1.57 |    0.12 | |        Answer63495657_NetMage |      .NET 4.8 |        10000 |            2000 |              Random |       618.8 μs |      17.53 μs |      23.99 μs |       622.4 μs |  1.65 |    0.13 | |         Answer63496256_Vernou |      .NET 4.8 |        10000 |            2000 |              Random |       645.6 μs |      27.28 μs |      39.99 μs |       632.2 μs |  1.71 |    0.16 | |                               |               |              |                 |                     |                |               |               |                |       |         | |           RemoveUsingListCopy | .NET Core 3.1 |        10000 |            2000 |              Random |       391.5 μs |      10.39 μs |      15.55 μs |       391.6 μs |  1.00 |    0.00 | |           RemoveUsingRemoveAt | .NET Core 3.1 |        10000 |            2000 |              Random |     1,402.2 μs |      44.20 μs |      64.80 μs |     1,407.6 μs |  3.59 |    0.21 | |       RemoveUsingMoveStrategy | .NET Core 3.1 |        10000 |            2000 |              Random |       557.9 μs |      19.73 μs |      27.00 μs |       557.2 μs |  1.43 |    0.10 | | Answer63496225_TheodorZoulias | .NET Core 3.1 |        10000 |            2000 |              Random |       424.3 μs |      20.90 μs |      29.30 μs |       424.2 μs |  1.09 |    0.09 | |       Answer63496768_CoolBots | .NET Core 3.1 |        10000 |            2000 |              Random |       535.0 μs |      19.37 μs |      27.16 μs |       537.1 μs |  1.37 |    0.08 | |        Answer63495657_NetMage | .NET Core 3.1 |        10000 |            2000 |              Random |       557.7 μs |      18.73 μs |      25.63 μs |       550.0 μs |  1.43 |    0.09 | |         Answer63496256_Vernou | .NET Core 3.1 |        10000 |            2000 |              Random |       554.2 μs |      13.82 μs |      18.45 μs |       554.0 μs |  1.42 |    0.07 | |                               |               |              |                 |                     |                |               |               |                |       |         | |           RemoveUsingListCopy |      .NET 4.8 |        10000 |            2000 |              Middle |       221.6 μs |       7.25 μs |      10.63 μs |       222.5 μs |  1.00 |    0.00 | |           RemoveUsingRemoveAt |      .NET 4.8 |        10000 |            2000 |              Middle |     1,195.3 μs |      20.01 μs |      28.69 μs |     1,187.7 μs |  5.42 |    0.30 | |       RemoveUsingMoveStrategy |      .NET 4.8 |        10000 |            2000 |              Middle |       405.0 μs |      13.65 μs |      19.14 μs |       410.7 μs |  1.83 |    0.10 | | Answer63496225_TheodorZoulias |      .NET 4.8 |        10000 |            2000 |              Middle |       206.3 μs |       8.62 μs |      12.09 μs |       204.9 μs |  0.94 |    0.08 | |       Answer63496768_CoolBots |      .NET 4.8 |        10000 |            2000 |              Middle |       427.5 μs |      15.56 μs |      22.81 μs |       435.4 μs |  1.93 |    0.13 | |        Answer63495657_NetMage |      .NET 4.8 |        10000 |            2000 |              Middle |       405.4 μs |      13.80 μs |      19.35 μs |       403.8 μs |  1.84 |    0.11 | |         Answer63496256_Vernou |      .NET 4.8 |        10000 |            2000 |              Middle |       413.9 μs |      15.26 μs |      20.89 μs |       419.8 μs |  1.87 |    0.12 | |                               |               |              |                 |                     |                |               |               |                |       |         | |           RemoveUsingListCopy | .NET Core 3.1 |        10000 |            2000 |              Middle |       235.2 μs |      10.73 μs |      15.73 μs |       236.2 μs |  1.00 |    0.00 | |           RemoveUsingRemoveAt | .NET Core 3.1 |        10000 |            2000 |              Middle |     1,345.6 μs |      32.07 μs |      43.90 μs |     1,352.7 μs |  5.77 |    0.41 | |       RemoveUsingMoveStrategy | .NET Core 3.1 |        10000 |            2000 |              Middle |       324.0 μs |       4.92 μs |       7.05 μs |       326.6 μs |  1.39 |    0.09 | | Answer63496225_TheodorZoulias | .NET Core 3.1 |        10000 |            2000 |              Middle |       262.9 μs |       6.18 μs |       9.06 μs |       265.4 μs |  1.12 |    0.08 | |       Answer63496768_CoolBots | .NET Core 3.1 |        10000 |            2000 |              Middle |       333.6 μs |      10.14 μs |      13.87 μs |       340.9 μs |  1.43 |    0.11 | |        Answer63495657_NetMage | .NET Core 3.1 |        10000 |            2000 |              Middle |       313.5 μs |       9.05 μs |      12.69 μs |       310.5 μs |  1.34 |    0.11 | |         Answer63496256_Vernou | .NET Core 3.1 |        10000 |            2000 |              Middle |       332.3 μs |       6.70 μs |       8.95 μs |       331.9 μs |  1.43 |    0.09 | |                               |               |              |                 |                     |                |               |               |                |       |         | |           RemoveUsingListCopy |      .NET 4.8 |     10000000 |            2000 |              Random |   253,977.1 μs |   2,721.70 μs |   3,989.43 μs |   253,809.0 μs |  1.00 |    0.00 | |           RemoveUsingRemoveAt |      .NET 4.8 |     10000000 |            2000 |              Random | 5,191,083.4 μs |  13,200.66 μs |  18,931.99 μs | 5,187,162.3 μs | 20.43 |    0.34 | |       RemoveUsingMoveStrategy |      .NET 4.8 |     10000000 |            2000 |              Random |    65,365.4 μs |     422.41 μs |     592.16 μs |    65,307.3 μs |  0.26 |    0.00 | | Answer63496225_TheodorZoulias |      .NET 4.8 |     10000000 |            2000 |              Random |   240,584.4 μs |   3,687.89 μs |   5,048.03 μs |   244,336.1 μs |  0.95 |    0.02 | |       Answer63496768_CoolBots |      .NET 4.8 |     10000000 |            2000 |              Random |    54,168.4 μs |   1,001.37 μs |   1,436.14 μs |    53,390.3 μs |  0.21 |    0.01 | |        Answer63495657_NetMage |      .NET 4.8 |     10000000 |            2000 |              Random |    72,501.4 μs |     452.46 μs |     634.29 μs |    72,161.2 μs |  0.29 |    0.00 | |         Answer63496256_Vernou |      .NET 4.8 |     10000000 |            2000 |              Random |     5,814.0 μs |      89.71 μs |     128.67 μs |     5,825.3 μs |  0.02 |    0.00 | |                               |               |              |                 |                     |                |               |               |                |       |         | |           RemoveUsingListCopy | .NET Core 3.1 |     10000000 |            2000 |              Random |   239,784.0 μs |   2,721.35 μs |   3,902.88 μs |   241,125.5 μs |  1.00 |    0.00 | |           RemoveUsingRemoveAt | .NET Core 3.1 |     10000000 |            2000 |              Random | 5,538,337.5 μs | 353,505.30 μs | 495,565.06 μs | 5,208,226.1 μs | 23.12 |    2.15 | |       RemoveUsingMoveStrategy | .NET Core 3.1 |     10000000 |            2000 |              Random |    33,071.8 μs |     103.80 μs |     138.57 μs |    33,030.5 μs |  0.14 |    0.00 | | Answer63496225_TheodorZoulias | .NET Core 3.1 |     10000000 |            2000 |              Random |   240,825.5 μs |     851.49 μs |   1,248.11 μs |   240,520.9 μs |  1.00 |    0.02 | |       Answer63496768_CoolBots | .NET Core 3.1 |     10000000 |            2000 |              Random |    26,265.0 μs |      90.76 μs |     124.23 μs |    26,253.0 μs |  0.11 |    0.00 | |        Answer63495657_NetMage | .NET Core 3.1 |     10000000 |            2000 |              Random |    48,670.6 μs |     581.51 μs |     833.99 μs |    48,303.0 μs |  0.20 |    0.00 | |         Answer63496256_Vernou | .NET Core 3.1 |     10000000 |            2000 |              Random |     5,905.5 μs |      96.27 μs |     131.78 μs |     5,915.1 μs |  0.02 |    0.00 | |                               |               |              |                 |                     |                |               |               |                |       |         | |           RemoveUsingListCopy |      .NET 4.8 |     10000000 |            2000 |              Middle |   153,776.2 μs |   2,454.90 μs |   3,674.38 μs |   152,872.0 μs |  1.00 |    0.00 | |           RemoveUsingRemoveAt |      .NET 4.8 |     10000000 |            2000 |              Middle | 5,245,952.0 μs |  13,845.58 μs |  20,294.67 μs | 5,252,922.4 μs | 34.10 |    0.81 | |       RemoveUsingMoveStrategy |      .NET 4.8 |     10000000 |            2000 |              Middle |    33,233.6 μs |     110.33 μs |     158.24 μs |    33,217.3 μs |  0.22 |    0.01 | | Answer63496225_TheodorZoulias |      .NET 4.8 |     10000000 |            2000 |              Middle |   128,949.8 μs |     560.72 μs |     804.17 μs |   128,724.9 μs |  0.84 |    0.02 | |       Answer63496768_CoolBots |      .NET 4.8 |     10000000 |            2000 |              Middle |    48,965.1 μs |      70.75 μs |      94.45 μs |    48,957.3 μs |  0.32 |    0.01 | |        Answer63495657_NetMage |      .NET 4.8 |     10000000 |            2000 |              Middle |    32,641.5 μs |      66.85 μs |      91.51 μs |    32,610.0 μs |  0.21 |    0.01 | |         Answer63496256_Vernou |      .NET 4.8 |     10000000 |            2000 |              Middle |     2,982.2 μs |      29.47 μs |      41.31 μs |     2,961.9 μs |  0.02 |    0.00 | |                               |               |              |                 |                     |                |               |               |                |       |         | |           RemoveUsingListCopy | .NET Core 3.1 |     10000000 |            2000 |              Middle |   144,208.7 μs |   2,035.88 μs |   2,984.16 μs |   142,693.2 μs |  1.00 |    0.00 | |           RemoveUsingRemoveAt | .NET Core 3.1 |     10000000 |            2000 |              Middle | 5,235,957.7 μs |  13,674.19 μs |  20,043.46 μs | 5,241,536.1 μs | 36.32 |    0.78 | |       RemoveUsingMoveStrategy | .NET Core 3.1 |     10000000 |            2000 |              Middle |    16,547.3 μs |      72.72 μs |     101.95 μs |    16,520.7 μs |  0.11 |    0.00 | | Answer63496225_TheodorZoulias | .NET Core 3.1 |     10000000 |            2000 |              Middle |   137,218.2 μs |     716.45 μs |     980.69 μs |   137,027.0 μs |  0.95 |    0.02 | |       Answer63496768_CoolBots | .NET Core 3.1 |     10000000 |            2000 |              Middle |    23,728.5 μs |      79.84 μs |     111.93 μs |    23,689.9 μs |  0.16 |    0.00 | |        Answer63495657_NetMage | .NET Core 3.1 |     10000000 |            2000 |              Middle |    17,298.3 μs |     216.46 μs |     310.44 μs |    17,165.5 μs |  0.12 |    0.00 | |         Answer63496256_Vernou | .NET Core 3.1 |     10000000 |            2000 |              Middle |     2,999.7 μs |      85.78 μs |     123.03 μs |     2,957.1 μs |  0.02 |    0.00 | [snip] 

Benchmark code

To see the various implementations, scroll a third of the way down and look for the methods decorated with [Benchmark()].

This requires the BenchmarkDotNet package.

using System; using System.Collections.Generic; using System.Linq; using System.Reflection; using BenchmarkDotNet.Attributes;  namespace SO63495264 {     public class Benchmarks     {         public enum RemovalLocation         {             Random,             Beginning,             Middle,             End         }          [Params(10_000, 10_000_000)]         public int DataListSize         {             get; set;         }          [Params(2_000)]         public int RemovalListSize         {             get; set;         }          //[ParamsAllValues()]         [Params(RemovalLocation.Random, RemovalLocation.Middle)]         public RemovalLocation RemovalListLocation         {             get; set;         }          public List<int> DataList         {             get;             set;         }          public IReadOnlyList<int> RemovalList         {             get;             set;         }          [GlobalSetup()]         public void GlobalSetup()         {             IEnumerable<int> removalIndices;              if (RemovalListLocation == RemovalLocation.Random)             {                 // Pass a constant seed so the same indices get generated for every run                 Random random = new Random(12345);                  removalIndices = Enumerable.Range(0, RemovalListSize)                     .Select(i => random.Next(DataListSize));             }             else             {                 int removalSegmentOffset = RemovalListLocation switch {                     RemovalLocation.Beginning => 0,                     RemovalLocation.Middle => (DataListSize - RemovalListSize) / 2,                     RemovalLocation.End => DataListSize - RemovalListSize,                     _ => throw new Exception($"Unexpected {nameof(RemovalLocation)} enumeration value {RemovalListLocation}.")                 };                  removalIndices = Enumerable.Range(removalSegmentOffset, RemovalListSize);             }              // For efficiency, create a single List<int> to be reused by all iterations for a given set of parameters             DataList = new List<int>(DataListSize);             RemovalList = removalIndices.ToList().AsReadOnly();         }          [IterationSetup()]         public void IterationSetup()         {             // Each iteration could modify DataList, so repopulate its elements for the             // next iteration.  DataList is either new or has been Clear()ed by IterationCleanup().             for (int i = 0; i < DataListSize; i++)                 DataList.Add(i);         }          [IterationCleanup()]         public void IterationCleanup()         {             DataList.Clear();         }          [GlobalCleanup()]         public void GlobalCleanup()         {             // Force collection of the List<> for the current set of parameters             int generation = GC.GetGeneration(DataList);              DataList = null;             GC.Collect(generation, GCCollectionMode.Forced, true);         }          [Benchmark(Baseline = true)]         public List<int> RemoveUsingListCopy()         {             HashSet<int> removalSet = RemovalList.ToHashSet();             List<int> newList = new List<int>(DataList.Count - removalSet.Count);              for (int index = 0; index < DataList.Count; index++)                 if (!removalSet.Contains(index))                     newList.Add(DataList[index]);              return newList;         }          // Based on Stack Overflow question 63495264         // https://stackoverflow.com/q/63495264/150605         [Benchmark()]         public List<int> RemoveUsingRemoveAt()         {             // The collection of indices to remove must be in decreasing order             // with no duplicates; all are assumed to be in-range for DataList             foreach (int index in RemovalList.Distinct().OrderByDescending(i => i))                 DataList.RemoveAt(index);              return DataList;         }          // From Stack Overflow answer 63498191         // https://stackoverflow.com/a/63498191/150605         [Benchmark()]         public List<int> RemoveUsingMoveStrategy()         {             // The collection of indices to remove must be in increasing order             // with no duplicates; all are assumed to be in-range for DataList             int[] coreIndices = RemovalList.Distinct().OrderBy(i => i).ToArray();             List<(int startIndex, int count, int distance)> moveOperations                 = new List<(int, int, int)>();              int candidateIndex;             for (int i = 0; i < coreIndices.Length; i = candidateIndex)             {                 int currentIndex = coreIndices[i];                 int elementCount = 1;                  // Merge removal of consecutive indices into one operation                 while ((candidateIndex = i + elementCount) < coreIndices.Length                     && coreIndices[candidateIndex] == currentIndex + elementCount                 )                 {                     elementCount++;                 }                  int nextIndex = candidateIndex < coreIndices.Length                     ? coreIndices[candidateIndex]                     : DataList.Count;                 int moveCount = nextIndex - currentIndex - elementCount;                  // Removing one or more elements from the end of the                 // list will result in a no-op, 0-length move; omit it                 if (moveCount > 0)                 {                     moveOperations.Add(                         (                             startIndex: currentIndex + elementCount,                             count: moveCount,                             distance: i + elementCount                         )                     );                 }             }              // Perform the element moves             foreach ((int startIndex, int count, int distance) in moveOperations)             {                 for (int offset = 0; offset < count; offset++)                 {                     int sourceIndex = startIndex + offset;                     int targetIndex = sourceIndex - distance;                      DataList[targetIndex] = DataList[sourceIndex];                 }             }              // "Trim" the number of removed elements from the end of the list             DataList.RemoveRange(DataList.Count - coreIndices.Length, coreIndices.Length);              return DataList;         }          // Adapted from Stack Overflow answer 63496225 revision 4         // https://stackoverflow.com/revisions/63496225/4         [Benchmark()]         public List<int> Answer63496225_TheodorZoulias()         {             HashSet<int> indicesSet = new HashSet<int>(RemovalList);             int index = 0;              DataList.RemoveAll(_ => indicesSet.Contains(index++));              return DataList;         }           // Adapted from Stack Overflow answer 63496768 revision 2         // https://stackoverflow.com/revisions/63496768/2         [Benchmark()]         public List<int> Answer63496768_CoolBots()         {             List<int> sortedIndicies = RemovalList.Distinct().OrderBy(i => i).ToList();              int sourceStartIndex = 0;             int destStartIndex = 0;             int spanLength = 0;              int skipCount = 0;              // Copy items up to last index to be skipped             foreach (int skipIndex in sortedIndicies)             {                 spanLength = skipIndex - sourceStartIndex;                 destStartIndex = sourceStartIndex - skipCount;                  for (int i = sourceStartIndex; i < sourceStartIndex + spanLength; i++)                 {                     DataList[destStartIndex] = DataList[i];                     destStartIndex++;                 }                  sourceStartIndex = skipIndex + 1;                 skipCount++;             }              // Copy remaining items (between last index to be skipped and end of list)             spanLength = DataList.Count - sourceStartIndex;             destStartIndex = sourceStartIndex - skipCount;              for (int i = sourceStartIndex; i < sourceStartIndex + spanLength; i++)             {                 DataList[destStartIndex] = DataList[i];                 destStartIndex++;             }              DataList.RemoveRange(destStartIndex, sortedIndicies.Count);              return DataList;         }          // Adapted from Stack Overflow answer 63495657 revision 6         // https://stackoverflow.com/revisions/63495657/6         [Benchmark()]         public List<int> Answer63495657_NetMage()         {             List<int> removeAtList = RemovalList.Distinct().OrderBy(i => i).ToList();              int srcCount = DataList.Count;             int ralCount = removeAtList.Count;             int removeAtIndice = 1;             int freeIndex = removeAtList[0];             int current = freeIndex + 1;             while (current < srcCount)             {                 while (removeAtIndice < ralCount && current == removeAtList[removeAtIndice])                 {                     ++current;                     ++removeAtIndice;                 }                  if (current < srcCount)                     DataList[freeIndex++] = DataList[current++];             }              DataList.RemoveRange(freeIndex, srcCount - freeIndex);              return DataList;         }          // Adapted from Stack Overflow answer 63496256 revision 3         // https://stackoverflow.com/revisions/63496256/3         [Benchmark()]         public List<int> Answer63496256_Vernou()         {             List<int> indices = RemovalList.Distinct().OrderBy(i => i).ToList();              //Get the internal array             int[] largeArray = (int[]) typeof(List<int>)                 .GetField("_items", BindingFlags.Instance | BindingFlags.NonPublic)                 .GetValue(DataList);              int current = 0;             int copyFrom = 0;              for (int i = 0; i < indices.Count; i++)             {                 int copyTo = indices[i];                 if (copyTo < copyFrom)                 {                     //In case the indice is duplicate,                     //The item is already passed                     continue;                 }                 int copyLength = copyTo - copyFrom;                 Array.Copy(largeArray, copyFrom, largeArray, current, copyLength);                 current += copyLength;                 copyFrom = copyTo + 1;             }             //Resize the internal array             DataList.RemoveRange(DataList.Count - indices.Count, indices.Count);              return DataList;         }     } } 

Launcher code

RunBenchmark() defines the type of benchmark job to run and for which runtime(s).

using System; using System.Collections.Generic; using System.Diagnostics; using System.Linq; using System.Reflection; using BenchmarkDotNet.Attributes; using BenchmarkDotNet.Configs; using BenchmarkDotNet.Environments; using BenchmarkDotNet.Jobs;  namespace SO63495264 {     class Program     {         static void Main(string[] args)         {             if (args?.Length == 0)             {                 string assemblyFilePath = Assembly.GetExecutingAssembly().Location;                 string assemblyFileName = System.IO.Path.GetFileName(assemblyFilePath);                  Console.WriteLine($"{assemblyFileName} {{ benchmark | test }}");             }             else if (string.Equals(args[0], "benchmark", StringComparison.OrdinalIgnoreCase))                 RunBenchmark();             else if (string.Equals(args[0], "test", StringComparison.OrdinalIgnoreCase))                 RunTest();             else                 Console.WriteLine($"Unexpected parameter \"{args[0]}\"");         }          static void RunBenchmark()         {             Job baseJob = Job.MediumRun;             IConfig config = DefaultConfig.Instance;              foreach (Runtime runtime in new Runtime[] { CoreRuntime.Core31, ClrRuntime.Net48 })             {                 config = config.AddJob(                     baseJob.WithRuntime(runtime)                 );             }              BenchmarkDotNet.Running.BenchmarkRunner.Run<Benchmarks>(config);         }          static void RunTest()         {             const int ListSize = 20;             const int MaxDisplayElements = 20;              IEnumerable<int> data = Enumerable.Range(0, ListSize);             IReadOnlyList<int> indices = new List<int>(                 new int[] {                     0,                               1, // First two indices                     ListSize / 4,                     ListSize / 2,     ListSize / 2 + 1, // Consecutive indices                     ListSize / 4 * 3, ListSize / 4 * 3, // Duplicate indices                     ListSize - 2,     ListSize - 1      // Last two indices                 }             ).AsReadOnly();              // Discover and invoke the benchmark methods the same way BenchmarkDotNet would             Benchmarks benchmarks = new Benchmarks() {                 RemovalList = indices             };             IEnumerable<MethodInfo> benchmarkMethods = benchmarks.GetType()                 .GetMethods(BindingFlags.Instance | BindingFlags.Public)                 .Where(                     method => method.CustomAttributes.Any(                         attribute => attribute.AttributeType == typeof(BenchmarkAttribute)                     )                 );              // Call a known-good method to get the correct results for comparison             benchmarks.DataList = data.ToList();             List<int> controlList = benchmarks.RemoveUsingListCopy();              foreach (MethodInfo benchmarkMethod in benchmarkMethods)             {                 List<int> inputList = data.ToList();                  benchmarks.DataList = inputList;                 Stopwatch watch = Stopwatch.StartNew();                 List<int> outputList = (List<int>) benchmarkMethod.Invoke(benchmarks, Array.Empty<object>());                 watch.Stop();                  Console.WriteLine($"{benchmarkMethod.Name}():");                 Console.WriteLine($"\tElements: {{ {string.Join(", ", outputList.Take(MaxDisplayElements))}{(outputList.Count > MaxDisplayElements ? ", ..." : "") } }}");                 Console.WriteLine($"\t   Count: {outputList.Count:N0}");                 Console.WriteLine($"\tSequence: {(outputList.SequenceEqual(controlList) ? "E" : "*Not* e")}qual to control list");                 Console.WriteLine($"\tDuration: {watch.Elapsed.TotalMilliseconds:N2} milliseconds");                 Console.WriteLine($"\tReturned: {(object.ReferenceEquals(outputList, inputList) ? "Input list" : "New list")}");             }         }     } } 

In order to benchmark on .NET Framework (4.8) I had to add the following properties to my .NET Core .csproj project file:

<TargetFrameworks>netcoreapp3.1;net48</TargetFrameworks> <LangVersion>8.0</LangVersion> 

41 characters to spare!

like image 83
Lance U. Matthews Avatar answered Sep 21 '22 11:09

Lance U. Matthews


Since the order of the source list is significant, you can move each item down the list, skipping indices to be removed, and then remove the end of the list.

UPDATE: Took the .Net Core source code for RemoveAll and modified it for a indice list instead of a predicate.

UPDATE 2: Optimized to not repeat tests if possible.

UPDATE 3: Simplified as optimization having extra code proved slower in benchmarks.

Having src as the large list, and removeAtList as the indices to remove in some random order, you can do:

removeAtList.Sort();  var srcCount = src.Count; var ralCount = removeAtList.Count; var removeAtIndice = 1; var freeIndex = removeAtList[0]; var current = freeIndex+1; while (current < srcCount) {     while (removeAtIndice < ralCount && current == removeAtList[removeAtIndice]) {         ++current;         ++removeAtIndice;     }      if (current < srcCount)         src[freeIndex++] = src[current++]; }  src.RemoveRange(freeIndex, srcCount-freeIndex); 

For a one billion element list of random integers, and a 1000 - 3000 element list of random indices, I get 1.1 ms per remove with this algorithm. Using RemoveAt, I get over 232.77 ms per remove, so this is about 200 times faster.

like image 41
NetMage Avatar answered Sep 20 '22 11:09

NetMage