[NSWI004] Strange bug in syscall exit implementation.
Vojtech Horky
horky at d3s.mff.cuni.cz
Thu Feb 13 12:28:18 CET 2020
Hello.
Dne 13. 02. 20 v 11:22 Tomáš Drozdík napsal(a):
> Hi,
>
> I've encountered a strange bug while working on assignment 6 which
> after circa 20 seconds of work which I cannot step through starts a
> presumably infinite loop of "Kernel panic: Exception..." prints.
>
> I've managed to achieve hitting a break point in user-space main
> function thus entering msim interactive mode as was suggested in task
> description. Then I've tried to implement a syscall as handling of
> general exception with cause 8 which caused correct switch back to
> kernel. Then I've just implemented a handle syscall for exit which
> calls `thread_finish` with the argument of `exit` syscall. This has to
> clear the address space of given thread which it does according to the
> debugger and when it needs to call `kfree` on as_t structure I'm no
> longer able to step through it an it hangs followed by an infinity
> loop described above.
You destroy as_t inside thread_finish. That means that if the thread in
question is using userspace stack, you destroy the AS that manages the
memory of the stack. Since you invalidate TLB, any other operation on
that stack (i.e. rest of thread_finish) causes TLB refill.
Similarly to destroying threads, AS are easier to destroy in thread_join
or similar function to prevent this kind of situation.
As a sidenote: we use a simplified implementation where kernel can use
userspace stack for exception handling. In real-world kernels, there are
often two stacks - userspace and kernel - so that state of kernel stack
never leaks into userspace. There, the stacks are switched as the very
first thing in the exception handler. Then the approach to destroy AS
soon(er) could work.
> Since the program enters an infinite loop of kernel panics, which
> should not happen at all, the code must have been modified during an
> execution. Then the function `handle_exception_general` must have been
> called repeatedly with invalid `cause` resulting in that kind of
> behavior. I think that there may be a problem with disabling of
> interrupts, which I don't do in handling of syscalls since we might
> hit another (e.g. TLB exception) which we do not want to ignore.
No. The infinite loop can be caused by repeatedly trying to access stack
that is no longer mapped.
From quick look at your code, I think the interrupts are not the
culprit. And I also do not think that you are overwriting your exception
routines (MSIM has commands to dump memory so you can check for yourself
that the code is not modified).
Hope this helps,
- VH
>
> Code is in my repo:
> https://gitlab.mff.cuni.cz/teaching/nswi004/winter-2019-20/team-i_cant_c/tree/as6
> (branch as6)
> Commit: b86c92e187e41e0ef342b1c82b39055e68d8a896 (BUG version)
> Commit: 45be7eeecc2fa9f80bd26a55c6fc52ce6d793b6b (handling of
> syscall on kernel side with manual break point in main)
>
> Any suggestion would be appreciated.
> Thanks in advance.
>
> Best regards
> Tomáš Drozdík
> _______________________________________________
> NSWI004 mailing list
> NSWI004 at d3s.mff.cuni.cz
> https://d3s.mff.cuni.cz/mailman/listinfo/nswi004
>
More information about the NSWI004
mailing list