[NSWI004] Strange bug in syscall exit implementation.

Thu Feb 13 12:28:18 CET 2020

Hello.

Dne 13. 02. 20 v 11:22 Tomáš Drozdík napsal(a):
> Hi,
> 
> I've encountered a strange bug while working on assignment 6 which
> after circa 20 seconds of work which I cannot step through starts a
> presumably infinite loop of "Kernel panic: Exception..." prints.
> 
> I've managed to achieve hitting a break point in user-space main
> function thus entering msim interactive mode as was suggested in task
> description. Then I've tried to implement a syscall as handling of
> general exception with cause 8 which caused correct switch back to
> kernel. Then I've just implemented a handle syscall for exit which
> calls `thread_finish` with the argument of `exit` syscall. This has to
> clear the address space of given thread which it does according to the
> debugger and when it needs to call `kfree` on as_t structure I'm no
> longer able to step through it an it hangs followed by an infinity
> loop described above.

You destroy as_t inside thread_finish. That means that if the thread in 
question is using userspace stack, you destroy the AS that manages the 
memory of the stack. Since you invalidate TLB, any other operation on 
that stack (i.e. rest of thread_finish) causes TLB refill.

Similarly to destroying threads, AS are easier to destroy in thread_join 
or similar function to prevent this kind of situation.

As a sidenote: we use a simplified implementation where kernel can use 
userspace stack for exception handling. In real-world kernels, there are 
often two stacks - userspace and kernel - so that state of kernel stack 
never leaks into userspace. There, the stacks are switched as the very 
first thing in the exception handler. Then the approach to destroy AS 
soon(er) could work.

> Since the program enters an infinite loop of kernel panics, which
> should not happen at all, the code must have been modified during an
> execution. Then the function `handle_exception_general` must have been
> called repeatedly with invalid `cause` resulting in that kind of
> behavior. I think that there may be a problem with disabling of
> interrupts, which I don't do in handling of syscalls since we might
> hit another (e.g. TLB exception) which we do not want to ignore.

No. The infinite loop can be caused by repeatedly trying to access stack 
that is no longer mapped.

 From quick look at your code, I think the interrupts are not the 
culprit. And I also do not think that you are overwriting your exception 
routines (MSIM has commands to dump memory so you can check for yourself 
that the code is not modified).

Hope this helps,
- VH

> 
> Code is in my repo:
> https://gitlab.mff.cuni.cz/teaching/nswi004/winter-2019-20/team-i_cant_c/tree/as6
>    (branch as6)
> Commit: b86c92e187e41e0ef342b1c82b39055e68d8a896   (BUG version)
> Commit: 45be7eeecc2fa9f80bd26a55c6fc52ce6d793b6b   (handling of
> syscall on kernel side with manual break point in main)
> 
> Any suggestion would be appreciated.
> Thanks in advance.
> 
> Best regards
> Tomáš Drozdík
> _______________________________________________
> NSWI004 mailing list
> NSWI004 at d3s.mff.cuni.cz
> https://d3s.mff.cuni.cz/mailman/listinfo/nswi004
>