Lectures | NSWI143

This page provides more detailed information about individual lectures as well as links to various resources. Students are encouraged to read selected textbook chapters and review slides before the lecture.

Lecture videos

Lecture videos from the academic year 2020/2021 are available in a shared folder on the university SharePoint. Use <your-login>@cuni.cz on the log-in screen where <your-login> is the login name you use to log into the Study Information System (SIS).

Please note that the videos discuss the design of a simple MIPS processor, whereas the lectures and other resources from the academic year 2023/2024 focus on the design of a simple RISC-V processor.

Finally, the videos are meant for your personal use only. Specifically, you are not permitted to redistribute (download and reupload) them in any way.

Digital circuit designs

Digital circuit designs are available for some of the lectures. These are for use with the LogiSim Evolution simulator, version 3.3.0 or later (tested with version 3.8.0). See the lectures below for links to specific designs.

Lecture 1 (Feb 20, 2024)

Agenda
Introduction
- What’s below your program
- Abstraction as a tool
Hennessy & Patterson, Computer Organization and Design
- Chapter 1, Computer Abstractions and Technology

Lecture 2 (Feb 27, 2024)

Computer performance
- Execution (response) time
- Throughput
- Classic performance equation
- Execution time, clocks per instruction (CPI), clock rate
- Amdahl’s law
Hennessy & Patterson, Computer Organization and Design
- Chapter 1, Computer Abstractions and Technology

Lecture 3 (Mar 5, 2024)

Digital circuits
- Combinational circuits
- Logical functions and basic gates
- Fundamental operation: 1-bit addition
- Simple arithmetics: n-bit addition, subtraction
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Appendix B, sections B.1 to B.3, B.5
Gates and logic families in Circuit Simulator
- basic gates made of manually operated switches
- observe the differences in current at the ground terminal
- RTL Inverter, NAND Gate
- TTL Inverter, NAND Gate
- NMOS Inverter, NAND Gate
- CMOS Inverter, NAND Gate
LogiSim: Simple adder
LogiSim: Simple adder/subtractor

Lecture 4 (Mar 12, 2024)

Digital circuits
- Sequential circuits, clock
- Memory elements: flip-flops, registers
- Sequential multiplication and division
Processor implementation (if time permits)
- RISC-V ISA overview
- Single-cycle data path implementation
  - fetching instructions
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Appendix A, sections B.7, B.8
- Chapter 4, sections 4.1, 4.2, and 4.3
LogiSim: Simple flip-flop

Lecture 5 (Mar 19, 2024)

Processor implementation
- Single-cycle data path implementation
  - register-register, register-immediate instructions
  - load/store instructions
  - conditional branch instruction
- Higher-level blocks for data path construction
  - 32-bit ALU built from 32 1-bit ALUs
  - 32-register file built from 4 8-register files
  - Simple circuits: multiplexers, decoders, sign/zero extension, zero detection
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 4, section 4.3
Hennessy & Patterson, Computer Organization and Design: RISC-V Edition (2nd ed.)
- Chapter 4, section 4.3
LogiSim: Single-cycle RISC-V data path implementation (version 1)
- Contains also libraries of generic and RISC-V-specific components.

Lecture 6 (Mar 26, 2024)

Processor implementation
- Single-cycle data path controller implementation
- Overview of multi-cycle data path implementation (without controller)
Hennessy & Patterson, Computer Organization and Design: RISC-V Edition (2nd ed.)
- Chapter 4, section 4.4 (single-cycle datapath)
- Chapter 4, section 4.5 (multi-cycle datapath)
- Appendix C, sections C.1 and C.2 (combinational controller)
LogiSim: Single-cycle RISC-V data path implementation (version 2)
- Updated to use 6-function ALU (with different names).
- The Instruction Memory (ROM) contains the following Bubble Sort program.
Bubble Sort
- Sorts 16 integers starting at address 0x80.
- Hand-written assembly version (bubble_sort-riscv.S) with an objdump-produced diassassembly (with and without instruction aliases) and memory dump that can be loaded into Instruction Memory in LogiSim (bubble_sort-riscv.raw).
- Memory dump with sample data that can be loaded into the Data Memory in LogiSim (bubble_sort-data.raw).
- C version (c_bubble_sort.c) with compiler-produced assembly (c_bubble_sort-O2-riscv.s) and objdump-produced dissassembly (c_bubble_sort-O2-riscv.objdump). Compiled using GCC at optimization level -O2. For illustration purposes – the current implementation of the processor does not support all instructions used in the GCC-generated code, but adding support for the few missing instructions should be relatively straightforward.

Lecture 7 (Apr 2, 2024)

Processor implementation (updated on Apr 2, 2024)
- Multi-cycle data path implementation
- Microcode controller for the multi-cycle data path
- Wired controller for the multi-cycle data path
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Appendix D, sections D.3 and D.4 (controller)
Hennessy & Patterson, Computer Organization and Design: RISC-V Edition (2nd ed.)
- Chapter 4, section 4.5 (multi-cycle data path)
- Appendix C, sections C.3, C.4 and C.5 (controller)
LogiSim: Multi-cycle RISC-V data path implementation
- First 128 bytes of RAM are shadowed by ROM which contains the Bubble Sort program

Lecture 8 (Apr 9, 2024)

Improving performance through pipelining
- Implementing pipelined data path and control
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 4, sections 4.5 and 4.6
Hennessy & Patterson, Computer Organization and Design: RISC-V Edition (2nd ed.)
- Chapter 4, sections 4.6 and 4.7

Lecture 9 (Apr 16, 2024)

Improving performance through pipelining
- Pipeline hazards (structural, data and control)
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 4, sections 4.6 to 4.8
LogiSim: Pipelined RISC-V data path implementation
- Note that the Bubble Sort program needs to be modified for each variant of the pipeline, depending on the kind of hazards handled by a particular variant.
- Each variant has its own instruction memory containing the correct version of the Bubble Sort program
Bubble Sort for the pipelined data path
- Different variants of the Bubble Sort program for different pipeline variants

Lecture 10 (Apr 23, 2024)

Issues in instruction pipelining
- Branch prediction, exceptions
- Static multiple issue (in-order super-scalar) pipeline
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 4, sections 4.8 and 4.9

Lecture 11 (Apr 30, 2024)

Super-scalar pipelines
- Dynamic multiple issue (out-of-order super-scalar) pipeline
- Speculative execution, exception handling
Memory technology and memory hierarchy
- Introduction, temporal and spatial locality
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 4, sections 4.10, 4.11, 4.14, and 4.15
- Chapter 5, section 5.1

Lecture 12 (May 7, 2024)

Memory technology and memory hierarchy
- Static and dynamic memory technology
- Memory hierarchy concepts
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 5, sections 5.1, and 5.2
LogiSim: Static RAM model
- The static RAM circuit (memory_static_8x8bit) shows a row decoder and the organization of a 8x8 memory cell matrix, down to S-R flip-flops made of NOR gates.
- The memory cells use a controlled buffer to avoid driving the bit lines when not enabled by a word line, which allows connecting the cell outputs to a shared bit line.
- The memory does not use a clock signal, toggle the Write input (while the Enable input is high) to write data to the memory.
Circuit: Simple DRAM cell model
- Logical 0 and 1 correspond to 0 and 1 Volt, respectively
- Bit information (logical 0 or 1) is stored as charge in a capacitor (C_s)
- To read the value, the bit line (represented by capacitor C_bl) is precharged to 0.5 Volts (value in the middle between logical 0 and 1)
- When reading the information stored in C_s, we are looking for an upwards or downwards swing in voltage (or alternatively, in current) resulting from charge equalization between C_s and C_bl.
- The voltage (current) swing is picked up and aplified by a sense amplifier (not shown in the circuit)

Lecture 13 (May 14, 2024)

The lecture has been cancelled due to Rector’s day. Please review the material for this and the final lecture (including the linked videos) so that we can discuss the most important parts of both during the final lecture.

Cache architectures
- Direct-mapped cache architecture
- Set-associative cache architecture
- Fully associative cache architecture
- Architectural parameters (ABC) and their impact on cache performance
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 5, sections 5.3, 5.4, and 5.8
LogiSim: Cache models
- Includes direct-mapped (64 KiB), 4-way associative (64 KiB) and fully-associative (512 B) cache models for 32-bit addresses.
- All cache models use conceptually similar components to better demonstrate the commonality and the differences in their internal architecture.
- The cache models support either an update or a replacement of a cache line on write (think of replacement as setting the entire cache line to zero before storing the word being written).
Lecture #13 videos
- Czech
- English

Lecture 14 (May 21, 2024)

Cache architectures
- 3C model of cache misses
- Write-through (WT) and write-back (WB) caches
- Handling hits and misses in WT and WB caches
- Multi-level cache organization
Cache coherence
- Cache coherence problem in multi-core and multi-processor system
- Cache coherence protocols for WT and WB caches
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 5, sections 5.9, 5.10, 5.12 (part related to cache coherence), 5.14, 5.15, and 5.16
Lecture #14 videos
- Czech
- English