[NSWI004] A04 grading and notes

Thu Dec 10 07:22:04 CET 2020

Hello,

I will be uploading comments for A04 assignment to your grading 
repository shortly.

Apart from few omissions, the solutions were in a good shape. Our 
comments mostly focused on these and on pointing out places for 
potential race conditions.

As usual - if you have any comments/questions, do not hesitate to 
contact us (this ML for general questions and teachers-nswi004 for 
complaints/comments/questions about your grading).

Follows a short overview of common issues that might be interesting to 
more teams even if not directly relevant to their code. They are in no 
particular order.

Ensure that you check fully the cause of the exception. At the moment, 
you were only interested in interrupts (exc. code 0) from the timer (7). 
Hence, there should be two conditions in the handler when the scheduler 
is supposed to be called.

Do not forget to protect shared data structures but do not disable 
interrupts when it is really not needed. For example, a typical 
thread_create:

thread = kmalloc(...)
thread->stack = kmalloc(...)
...
thread->context = ...
...
scheduler_add_ready_thread(thread)

does not need disabled interrupts at all as individual calls (kmalloc 
and scheduler-related) are already protected. It is completely fine if 
the thread is interrupted (and rescheduled) between the calls to kmalloc.

On the other hand, a typical thread_join shall be protected around the 
following fragment

if (thread->joined_by != NULL) { return EBUSY; }
thread->joined_by = thread_get_current()

to ensure that the thread is not rescheduled between these two lines and 
joined_by changed meanwhile. Even if use of thread_join from multiple 
places is not a typical use case.

It is also generally better to enter critical section for the shortest 
time possible. For example, the following prologue of kfree() does not 
need to be protected at all as it does not access any shared data 
structures:

if (ptr == NULL) {
    return;
}
block_header_t* header = ptr - sizeof(block_header_t);
assert(header->magic == HEADER_MAGIC);

Use the right kind of protection of the critical section. Heap can 
easily use mutex as a higher-level primitive that is supposed to work 
even in multiprocessor environment. On the other hand, it is fine to use 
disabling interrupts for a scheduler ready queue. First of all, we do 
not have to go through the recursive reasoning that scheduler is using 
mutex that is using scheduler in its implementation. And next because 
scheduler typically maintains some kind of per-CPU queue and that can be 
easily (and correctly) protected by disabled interrupts only.

I have to stress once again the need and advantages of using the right 
abstraction layers. As a typical example, following three approaches 
were seen when reacting to the timer interrupts: call 
scheduler_handle_timer_interrupt(), scheduler_schedule_next() and 
thread_yield().

In the end, all of them did the same - find the next ready thread and 
switched to it. However, the first one states that it is up to the 
scheduler to decide what to do on the interrupt (i.e. we clearly mark 
the responsibility to belong to the scheduler without caring about the 
implementation). In the second approach, we still mark the scheduler as 
the responsible entity but the abstraction layer is leaking a little bit 
as we reveal what the scheduler is doing. In the last approach, we leak 
the implementation detail that preemption and (voluntary) yield share 
the same implementation.

As another example, if the mutex_lock() contained the following

if (mutex->locked) {
    list_append(...);
    current_thread->state = state_blocked_mutex;
    thread_suspend();
}

then the thread structure members are used without proper encapsulation 
(e.g. without concurrent-access protection) and we also leak abstraction 
detail about thread states.

On an unrelated note: the state space of thread now contains multiple 
states that are treated the same in multiple places - the thread is 
blocked, the reason is irrelevant to most parts of the code. If we would 
need this information for debugging purposes, it would be probably 
better to add extra suspend call where we could pass the object we are 
blocked on and the type of that object (in the following example we 
would just pass the printk directive for debugging prints).

   ...
   thread_block_on_object(mutex, "%pM");
   ...

I hope it makes sense :-).

Cheers,
- VH