[NSWI004] a04 no-interrupt best practices

Mon Nov 30 16:24:48 CET 2020

Hello Martin,

I believe Jakub was asking specifically about disabling interrupts as a precautionary measure. As I have pointed out, in real systems that is not a good strategy because interrupts generally should not be disabled for long, and also because on multiprocessor systems, locally disabling interrupts does not protect against activities of other threads. Your points seem to be different, let me comment below.

> I'm a little bit confused from your answer. If I understood it
> correctly Jakub is asking what would be the best practice. Of course
> in our assignment we should know our code perfectly. But in real life
> that's not the case. And I thought by having teams you're trying to
> get at least a little bit closer to reality and teach us best
> practices which are useful when we program something bigger. Because
> when we approach our assignment with attitude "the code isn't that
> big", we can scratch most of the principles of writing good code
> which would only pay off in the long term.

Obviously, a reasonable choice of good practices depends on the context. In this context, I believe good practice with respect to disabling interrupts is knowing why and reasonably restricting the amount of execution done with interrupts disabled - in an assignment of the size we expect you to do, this is reasonably feasible. Larger systems might require a more careful approach, such as defining and documenting a strategy to handle interrupts, but in your assignments that might be an overkill.

Note that disabling interrupts is reasonable only in code that needs that to build higher level scheduling and synchronization functionality. Once you have threads and mutexes, you should not need to disable interrupts arbitrarily, regular synchronization tools should be used instead. That means that even in a larger system, code that directly disables and enables interrupts is likely to be restricted to a functionally small subset of the system.

> We are encouraged to use defensive mechanisms in our code (at least I
> thought so from the feedback) and check things which aren't strictly
> necessary to check but help to avoid making a mistake or at least
> ease the debugging process. We are taught that we should write code
> as though someone else will continue development. And if I go really
> carefully through my code and reorder a function in a way that the
> lock (interrupt disable) isn't necessary someone else may come to my
> program do a small tweak and break it because he didn't know this is
> possibly a critical section because it wasn't locked.

Defensive coding practices, such as checking return values for errors using assertions, are not the same as disabling interrupts as a precaution. Assertions and similar checks help point out places where the developer assumptions might have been wrong. Disabling interrupts masks such places instead. Let me use a simple example:

Case A:
data_t *data_ptr = (data_t *) kmalloc (sizeof (data_t));
panic_if_null (data_t);

Case B:
int interrupts = push_and_disable_interrupts ();
data_t *data_ptr = (data_t *) kmalloc (sizeof (data_t));
pop_interrupts (interrupts);

In A, I assume the allocation is unlikely to return NULL that I could recover from, yet I still insert a check. If my assumption turns out to be wrong, my code will display a proper notification, which I can address. In B, I assume the allocation is not thread safe. If my assumption is wrong, my code keeps wastefully disabling interrupts and I will never be notified.

You suggest you might have code that is ordered in a very special way that influences whether interrupts need to be disabled. I think that is very unlikely, but if you have a specific example, do post it here, we can discuss it.

> With this in mind, is it still true that we should disable interrupts
> as little as possible? Is it better to leave a comment "This doesn't
> need a lock"? Is it better to put the lock there with comment "This
> lock isn't necessary but better safe than sorry"? There may be cases
> where analysis whether a section does need a lock may be complicated
> and susceptible to errors when edited/refactored/changed. Should I
> put a lock there? Maybe comment "Shouldn't be necessary but just in
> case".

Please do not mix locks and disabling interrupts. These are very different concepts. A lock is associated with specific data structures that it protects and it is quite clear when it needs to be locked. Disabling interrupts is a non specific action that impacts the entire system, which is why it should be used carefully.

> For example sem_destroy(sem_t* sem) which looks like this.
[...]
> And let's say I decide that most of the function need to be critical
> section from top to bottom and then one function doesn't need from
> top to bottom but only part of it. However the code is much more
> readable when the pattern goes on and I put that one function in
> critical section top to bottom as well.  Because then someone else
> might do stupid mistake when he presumes that all functions are
> critical section top to bottom but didn't notice this one weird
> exception.

If you are worried that people will overlook blocks with disabled interrupts, you may adopt some coding practices that make such blocks more visible, for example declare a macro whose body executes with disabled interrupts and use that with proper indentation. But there are obviously limits to what you can do, which may be all the more reason to only manipulate interrupts when strictly necessary.

> Therefore I don't feel like instructions "use it sparingly" and "use
> it for minimum duration possible" are complete. I'm trying to avoid
> the same problem we had with a02 when we got minus points for good
> code practices we didn't know about and which weren't talked about in
> this course. Therefore I'm trying to ask in advance here what are the
> best practices with locking because I have no idea what's right. I'd
> appreciate if you could go through our code and say if we use these
> right or wrong because in our team we don't know. Or better provide
> some guidelines so we do it right and don't loose point
> unnecessarily.
> 
> And of course I'm not asking only for this case. This happens in
> higher languages. How different is the case here and in a
> higher-level language? Are there some rules which apply to
> synchronization practices in general?

I'm afraid you are asking for explanations that can make for an entire course. If you want to know more about locking and synchronization practices in a real system kernel, you might want to start by reading the Linux kernel documentation:

https://www.kernel.org/doc/html/latest/kernel-hacking/locking.html
https://www.kernel.org/doc/html/latest/locking/index.html
https://www.kernel.org/doc/html/latest/core-api/genericirq.html

Still, for disabling interrupts, I think a compact advice boils down to what I wrote earlier - use it sparingly and for limited duration only, preferably restrict interrupt manipulation to few parts of the code that really need it (interrupt handling, perhaps implementation of certain synchronization primitives). For locking, which you also mention, the some advice would be for example - develop a systematic way of associating locks with the structures they protect (even a simple naming convention can help), avoid locking multiple unrelated structures together and locking the same structures at different abstraction levels in your code.

Petr