Lab 05 | NPRG035

`string` vs. `StringBuilder`

The string type is strictly immutable, so an existing string instance cannot be modified. You can only create a new instance with different content. Consider the following function that returns a list of numbers as a string with a given delimiter:

string ConcatenateNumbers(List<int> numbers, char delimiter) {
    string result = "";
    foreach (int number in numbers) {
        result += number
        result += delimiter
    }
    if (result.Length != 0) // Possibly remove trailing delimiter
        result = result.Remove(result.Length - 1);
    return result;
}

Calling such a function like ConcatenateNumbers([1,2,3,4], '-') will return the string 1-2-3-4. Because string is immutable, a new string must be created in each loop iteration, and since string is a reference type, each new string means a heap allocation. Consider which objects are created on the heap during this method call (answer)

Adding a single character to a string is therefore a linear-time operation. The solution to this problem is the StringBuilder class. We can imagine its implementation similarly to List<char>, i.e., an array with a capacity larger than the number of items (Length). Adding an element to the list usually means only incrementing the count and writing the character into the correct position in the array. When the array runs out of space, a new array with double the size is created and the original array contents are copied into it. On average (amortized), adding a character is a constant-time operation. In a more efficient implementation of the original method, objects allocated on the heap would be (these:

string ConcatenateNumbers(List<int> numbers, char delimiter) {
    var result = new StringBuilder();
    foreach (int number in numbers) {
        result.Append(number)
        result.Append(delimiter)
    }
    if (result.Length != 0) // Possibly remove trailing delimiter
        result.Length--;
    return result.ToString();
}

Note that the real implementation of StringBuilder is much more sophisticated and based on analysis of common usage patterns across the history of the C# language!

We can think of StringBuilder as a mutable text buffer. You must account for this when programming. If you pass a StringBuilder instance to a method, expect the method to modify the text. With string you have the guarantee this cannot happen.

Single-character `string` vs. `char`

To represent a space in a program we have two options:

as char: ' '
as string: " "

Although both look similar in code (in Python these notations are even equivalent), there is a crucial difference.

The char type is a value type. The variable size is 2B, and memory stores the value 0x20 representing a space.

The string type is a reference type. The variable size is typically 8B (platform-dependent), which stores a reference to the heap where the instance resides. The instance itself typically has (platform-dependent) (8+8)B overhead, 2B per character (with value 0x20) and possibly additional data the string needs for internal implementation (like its length).

There is not only a memory but also a time difference between using stringBuilder.Append(' ') and stringBuilder.Append(" "). Consider that the implementation of Append(char) is certainly simpler than Append(string) (reason).

System.Linq

The System.Linq namespace contains functionality very useful to an experienced C# programmer. However, we currently lack crucial knowledge of some C# concepts necessary to understand all aspects (and implementation consequences) of the types and methods in this namespace. Unfortunately, because this is an essential part of C#, a large number of resources (including LLMs) use these functions and often use them inappropriately, sometimes even completely incorrectly. For our own good, let’s forbid using this namespace for tasks in this semester.

As an example, consider the type Queue<int> representing a FIFO queue of integers. This type efficiently supports enqueuing (method Enqueue), dequeuing (method Dequeue) and getting the number of elements (Count). However, Queue does not provide a way to get the second, third or N-th (other than the first) element. System.Linq adds this capability with ElementAt(int index). So to print the queue contents we might use:

void PrintQueue(Queue<int> queue) {
    for (int i = 0; i < queue.Count; ++i)
        Console.WriteLine(queue.ElementAt(i));
}

Surprisingly, the asymptotic complexity of this method is quadratic with respect to the queue size. This is because ElementAt has no other way to find the N-th element than:

int ElementAt(int selectedIndex) {
    Queue<int> queue = this;
    int currentIndex = 0;
    foreach (int element in queue) {
        if (currentIndex++ == selectedIndex)
            return element;
    }
}

The collection (the queue in this case) is therefore traversed from the start for each ElementAt(i) call to reach its i-th element.

This applies generally to all methods that System.Linq provides for standard collections. Some optimizations can be devised in certain situations, but one cannot rely on their presence.

C# programming conventions

A programming convention is a set of rules for writing code in a language to help readers (other programmers) navigate our code. First, note that how code is written in C# differs from other languages (e.g., C++). Also, the convention in team Project 1 may differ from team Project 2. In this section we present what C# programmers generally expect (the language implementation itself uses the same convention).

The language used for identifiers should be consistent across a project. Today even small teams use English for code. Larger (international) teams have no other practical option.

The following code demonstrates how to name various identifiers (types, fields, properties, methods, …):

public class ClassPascalCase {
    public int PublicFieldPascalCase;
    public int PublicPropertyPascalCase { get; set; }

    public static int PublicStaticFieldPascalCase;
    public static int PublicStaticPropertyPascalCase { get; set; }

    public const int PublicConstantPascalCase = 42;

    public int GetVerbPublicMethodPascalCase(int camelCaseArgumentOne) {
        int localVariableCamelCase = 5;
        return 1 + localVariableCamelCase + camelCaseArgumentOne + _privateField + s_privateStaticField;
    }

    private int GetVerbPrivateMethodPascalCase(int camelCaseArgumentTwo) {
        return 2;
    }

    private int PrivatePropertyPascalCase { get; set; }
    private const int PrivateConstantPascalCase = 42;

    private int _privateField; // Using underscore as a prefix is fairly new and not always recognized.
    private static int s_privateStaticField; // On the other hand it's very clear which data is being accessed by a method.
}

public struct StructPascalCase {
    public int PublicField;
}

public interface IPascalCase { // We use I as a prefix for interfaces.
    public void StartVerbPascalCase(); // Method name should include a verb so that it's clear that some action will be performed when invoking the method.
}

public class SpecificThingWentWrongException : Exception { // We use Exception as a suffix for exceptions.
}

public record class RecordPascalCase(int PascalCaseProperty, long PascalCaseOtherProperty);

public class ClassPrimaryConstructor(int ctorArgumentCamelCase, int _capturedCtorArgument) {
    public int PublicReadOnlyProperty { get; } = ctorArgumentCamelCase; // We do not use ctorArgumentCamelCase
    // outside of initializations of non-record,
    // so it does NOT get captured by copy
    // into a private field.

    public int CalcPublicValue() {
        return _capturedCtorArgument;   // We use _capturedCtorArgument here outside of initializations of non-record,
        // so the name "_capturedCtorArgument" represents the private field
        // with the captured value of primary ctor argument here !!!
    }
}

public class SomeClassTests {
    public void MethodTestPascalCase_PascalCase_PascalCase() { // Using underscore in test methods is alright to further differentiate between various method use scenarios.

    }
}

string0{ "" }
string1{ "1" }
string2{ "1-" }
string3{ "1-2" }
string4{ "1-2-" }
string5{ "1-2-3" }
string6{ "1-2-3-" }
string7{ "1-2-3-4" }
string8{ "1-2-3-4-" }
string9{ "1-2-3-4" }

A total of 10 strings

StringBuilder0{ array0[' ', ' ', ' ', ' '], Length = 0 }
StringBuilder0{ array0['1', ' ', ' ', ' '], Length = 1 }
StringBuilder0{ array0['1', '-', ' ', ' '], Length = 2 }
StringBuilder0{ array0['1', '-', '2', ' '], Length = 3 }
StringBuilder0{ array0['1', '-', '2', '-'], Length = 4 }
StringBuilder0{ array1['1', '-', '2', '-', '3', ' ', ' ', ' '], Length = 5 }
StringBuilder0{ array1['1', '-', '2', '-', '3', '-', ' ', ' '], Length = 6 }
StringBuilder0{ array1['1', '-', '2', '-', '3', '-', '4', ' '], Length = 7 }
StringBuilder0{ array1['1', '-', '2', '-', '3', '-', '4', '-'], Length = 8 }
StringBuilder0{ array1['1', '-', '2', '-', '3', '-', '4', ' '], Length = 7 }
string0{ "1-2-3-4" }

1x `StringBuilder`, 2x `char[]`, 1x `string`

string vs. StringBuilder

Single-character string vs. char

System.Linq

C# programming conventions

`string` vs. `StringBuilder`

Single-character `string` vs. `char`