Replace and remove whitespaces in strings - performant and sustainable

Replace and remove whitespaces in strings - performant and sustainable

There are many ways to remove spaces or other characters in a string - there are just very big differences in terms of performance and efficiency.

Benchmark

As part of my Sustainable Code Repository , I have created various code snippets to simulate everyday coding situations. I used regex, string operations and the relatively new Span API . I deliberately left out vectorization, e.g. with the help of Vector128 , which will be the most performant solution. Vector128 is highly optimized, but not an everyday solution.

Code

So I have created a code that uses an input to remove spaces in various ways.

  1// Made by Benjamin Abt - https://github.com/BenjaminAbt
  2
  3using System;
  4using System.Buffers;
  5using System.Text;
  6using System.Text.RegularExpressions;
  7using BenchmarkDotNet.Attributes;
  8using BenchmarkDotNet.Columns;
  9using BenchmarkDotNet.Jobs;
 10using BenchmarkDotNet.Running;
 11
 12
 13BenchmarkRunner.Run<Benchmark>();
 14
 15[MemoryDiagnoser]
 16[SimpleJob(RuntimeMoniker.Net80)]
 17[SimpleJob(RuntimeMoniker.Net90, baseline: true)]
 18[HideColumns(Column.Job)]
 19public class Benchmark
 20{
 21    public const string Input = @"""
 22        Hello\u0001World Hello\u0001World Hello\u0001World Hello\u0001World
 23        Hello\u0001World Hello\u0001World Hello\u0001World Hello\u0001World
 24        Hello\u0001World Hello\u0001World Hello\u0001World Hello\u0001World
 25        Hello\u0001World Hello\u0001World Hello\u0001World Hello\u0001World
 26        Hello\u0001World Hello\u0001World Hello\u0001World Hello\u0001World
 27        Hello\u0001World Hello\u0001World Hello\u0001World Hello\u0001World
 28        Hello\u0001World Hello\u0001World Hello\u0001World Hello\u0001World
 29        Hello\u0001World Hello\u0001World Hello\u0001World Hello\u0001World
 30        """;
 31
 32    [Benchmark]
 33    public string Regex()
 34    {
 35        return RegexSample.WhiteSpaceRegex().Replace(Input, "");
 36    }
 37
 38    [Benchmark]
 39    public string String()
 40    {
 41        string data = Input;
 42
 43        return data.Replace(" ", "");
 44    }
 45
 46    [Benchmark]
 47    public string Span()
 48    {
 49        ReadOnlySpan<char> inputSpan = Input.AsSpan();
 50        Span<char> resultSpan = stackalloc char[Input.Length];
 51        int resultIndex = 0;
 52
 53        foreach (char c in inputSpan)
 54        {
 55            if (c is not ' ')
 56            {
 57                resultSpan[resultIndex++] = c;
 58            }
 59        }
 60
 61        return new string(resultSpan.Slice(0, resultIndex));
 62    }
 63
 64    [Benchmark]
 65    public string StringBuilder()
 66    {
 67        StringBuilder stringBuilder = new(Input);
 68        stringBuilder.Replace(" ", "");
 69
 70        return stringBuilder.ToString();
 71    }
 72
 73    [Benchmark]
 74    public string JoinSplit()
 75    {
 76        return string.Join("", Input.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));
 77    }
 78
 79    [Benchmark]
 80    public string ConcatSplit()
 81    {
 82        return string.Concat(Input.Split(null));
 83    }
 84
 85    [Benchmark]
 86    public string SpanArrayPool()
 87    {
 88        char[] pooledArray = ArrayPool<char>.Shared.Rent(Input.Length);
 89        try
 90        {
 91            Span<char> destination = pooledArray.AsSpan(0, Input.Length);
 92
 93            int pos = 0;
 94
 95            foreach (char c in Input)
 96            {
 97                if (!char.IsWhiteSpace(c))
 98                {
 99                    destination[pos++] = c;
100                }
101            }
102
103            return Input.Length == pos ? Input : new string(destination[..pos]);
104        }
105        finally
106        {
107            ArrayPool<char>.Shared.Return(pooledArray);
108        }
109    }
110
111    [Benchmark]
112    public string SpanStackPool()
113    {
114        // this only works when Input <256 to avoid heap allocation
115        Span<char> destination = stackalloc char[Input.Length];
116
117        int pos = 0;
118
119        foreach (char c in Input)
120        {
121            if (!char.IsWhiteSpace(c))
122            {
123                destination[pos++] = c;
124            }
125        }
126
127        return Input.Length == pos ? Input : new string(destination[..pos]);
128    }
129}
130
131public static partial class RegexSample
132{
133    [GeneratedRegex(@"\s+")]
134    public static partial Regex WhiteSpaceRegex();
135}

Benchmark

I then used benchmarking to measure this code.

 1BenchmarkDotNet v0.14.0, Windows 10 (10.0.19045.5131/22H2/2022Update)
 2AMD Ryzen 9 9950X, 1 CPU, 32 logical and 16 physical cores
 3.NET SDK 9.0.100
 4  [Host]   : .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
 5  .NET 8.0 : .NET 8.0.11 (8.0.1124.51707), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
 6  .NET 9.0 : .NET 9.0.0 (9.0.24.52809), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
 7
 8
 9| Method        | Runtime  | Mean       | Error    | StdDev   | Ratio | RatioSD | Gen0   | Gen1   | Allocated | Alloc Ratio |
10|-------------- |--------- |-----------:|---------:|---------:|------:|--------:|-------:|-------:|----------:|------------:|
11| Regex         | .NET 8.0 |   656.2 ns |  7.17 ns |  6.36 ns |  1.06 |    0.01 | 0.0629 |      - |   1.03 KB |        1.00 |
12| Regex         | .NET 9.0 |   617.5 ns |  3.67 ns |  3.44 ns |  1.00 |    0.01 | 0.0629 |      - |   1.03 KB |        1.00 |
13|               |          |            |          |          |       |         |        |        |           |             |
14| String        | .NET 8.0 | 1,074.5 ns |  9.94 ns |  9.29 ns |  1.21 |    0.01 | 0.0629 |      - |   1.05 KB |        1.00 |
15| String        | .NET 9.0 |   887.9 ns |  2.59 ns |  2.29 ns |  1.00 |    0.00 | 0.0639 |      - |   1.05 KB |        1.00 |
16|               |          |            |          |          |       |         |        |        |           |             |
17| Span          | .NET 8.0 |   288.8 ns |  5.67 ns |  6.31 ns |  1.53 |    0.04 | 0.0639 |      - |   1.05 KB |        1.00 |
18| Span          | .NET 9.0 |   188.4 ns |  2.61 ns |  2.45 ns |  1.00 |    0.02 | 0.0639 |      - |   1.05 KB |        1.00 |
19|               |          |            |          |          |       |         |        |        |           |             |
20| StringBuilder | .NET 8.0 | 1,042.8 ns |  8.32 ns |  7.78 ns |  1.20 |    0.02 | 0.1411 |      - |   2.33 KB |        1.00 |
21| StringBuilder | .NET 9.0 |   871.6 ns | 13.50 ns | 12.63 ns |  1.00 |    0.02 | 0.1421 |      - |   2.33 KB |        1.00 |
22|               |          |            |          |          |       |         |        |        |           |             |
23| JoinSplit     | .NET 8.0 |   688.6 ns |  4.47 ns |  4.18 ns |  1.06 |    0.01 | 0.2422 | 0.0010 |   3.97 KB |        1.00 |
24| JoinSplit     | .NET 9.0 |   650.9 ns |  7.26 ns |  6.79 ns |  1.00 |    0.01 | 0.2422 | 0.0010 |   3.97 KB |        1.00 |
25|               |          |            |          |          |       |         |        |        |           |             |
26| ConcatSplit   | .NET 8.0 |   680.3 ns | 10.91 ns | 10.20 ns |  1.07 |    0.02 | 0.2251 | 0.0010 |   3.68 KB |        1.00 |
27| ConcatSplit   | .NET 9.0 |   634.2 ns | 11.93 ns | 11.15 ns |  1.00 |    0.02 | 0.2251 | 0.0010 |   3.68 KB |        1.00 |
28|               |          |            |          |          |       |         |        |        |           |             |
29| SpanArrayPool | .NET 8.0 |   312.7 ns |  3.35 ns |  3.13 ns |  0.98 |    0.01 | 0.0629 |      - |   1.03 KB |        1.00 |
30| SpanArrayPool | .NET 9.0 |   318.8 ns |  3.48 ns |  3.26 ns |  1.00 |    0.01 | 0.0629 |      - |   1.03 KB |        1.00 |
31|               |          |            |          |          |       |         |        |        |           |             |
32| SpanStackPool | .NET 8.0 |   370.0 ns |  2.87 ns |  2.69 ns |  1.27 |    0.02 | 0.0629 |      - |   1.03 KB |        1.00 |
33| SpanStackPool | .NET 9.0 |   291.1 ns |  5.08 ns |  4.75 ns |  1.00 |    0.02 | 0.0629 |      - |   1.03 KB |        1.00 |

You can see some enormous differences; StringBuilder , which is actually so performant in many situations, is much slower and generates many more allocations. Regex , on the other hand, is not as bad as you might think. But as you would expect, the various Span variants are all very far ahead in terms of performance - so it’s good that the new Span API is really easy to understand and use.

However, if you need the very best performance, you won’t be able to avoid using AVX2 , for example meziantou’s Replace characters in a string using Vectorization implementation.

Sustainable Code

You can find this and many more examples on my GitHub under Sustainable Code .


Comments

Twitter Facebook LinkedIn WhatsApp