Lectures | NSWI143

This page provides more detailed information about individual lectures as well as links to various resources. Students are encouraged to read selected textbook chapters and review slides before the lecture.

Lecture videos

Lecture videos from the academic year 2020/2021 are available in a shared folder on the university SharePoint. Use <your-login>@cuni.cz on the log-in screen where <your-login> is the login name you use to log into the Study Information System (SIS).

Digital circuit designs

Digital circuit designs are available for some of the lectures. These are for use with the LogiSim Evolution simulator, version 3.3.0 or later (tested with version 3.4.2). See the lectures below for links to specific designs.

Lecture 1 (Feb 14, Feb 15)

Agenda
Introduction
- What’s below your program
- Abstraction as a tool
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 1, Computer Abstractions and Technology

Lecture 2 (Feb 21, Feb 22)

Computer performance
- Execution (response) time
- Throughput
- Classic performance equation
- Execution time, clocks per instruction (CPI), clock rate
- Amdahl’s law
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 1, Computer Abstractions and Technology

Lecture 3 (Feb 28, Mar 1)

Overview: Instructions of a computer
Digital circuits
- Combinational circuits
- Logical functions and basic gates
- Fundamental operation: 1-bit addition
- Simple arithmetics: n-bit addition, subtraction
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Appendix B, sections B.1 to B.3, B.5
Gates and logic families in Circuit Simulator
- basic gates made of manually operated switches
- observe the differences in current at the ground terminal
- RTL Inverter, NAND Gate
- TTL Inverter, NAND Gate
- NMOS Inverter, NAND Gate
- CMOS Inverter, NAND Gate
LogiSim: Simple adder
LogiSim: Simple adder/subtractor

Lecture 4 (Mar 7, Mar 8)

Digital circuits
- Sequential circuits, clock
- Memory elements: flip-flops, registers
- Sequential multiplication and division
Processor implementation
- MIPS ISA overview
- Single-cycle data path implementation
  - fetching instructions
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Appendix B, sections B.7, B.8
- Chapter 4, sections 4.1, 4.2, and 4.3
LogiSim: Simple flip-flop

Lecture 5 (Mar 14, Mar 15)

Processor implementation
- Single-cycle data path implementation
  - register-register, register-immediate, load/store
- Higher-level blocks for data path construction
  - 32-bit ALU built from 32 1-bit ALUs
  - 32-register file built from 4 8-register files
  - Simple circuits: multiplexers, decoders, sign/zero extension, zero detection
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 4, section 4.3
LogiSim: Single-cycle MIPS data path implementation
- Contains also libraries of generic and MIPS-specific components.

Lecture 6 (Mar 21, Mar 22)

Processor implementation
- Single-cycle data path controller implementation
- Overview of multi-cycle data path implementation (without controller)
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 4, section 4.4
- Appendix D, sections D.1 and D.2 (controller)
Hennessy & Patterson, Computer Organization and Design (3rd ed.)
- Chapter 5, section 5.5 (multicycle datapath)
LogiSim: Single-cycle MIPS data path implementation
- The Instruction Memory (ROM) contains the following Bubble Sort program.
Bubble Sort
- Sorts 16 integers starting at address 0x80.
- Hand-written assembly version (bubble_sort.S) with a memory dump from QtMIPS (bubble_sort.dump) and a memory dump that can be loaded into Instruction Memory in LogiSim (bubble_sort.raw).
- Memory dump with sample data that can be loaded into the Data Memory in LogiSim (bubble_sort-data.raw).
- C version (c_bubble_sort.c) with compiler-produced assembly (c_bubble_sort-O2.s) and objdump-produced dissassembly (c_bubble_sort-O2.objdump). Compiled using GCC at optimization level -O2. For illustration purposes only – the current implementation of the processor is not able to execute the GCC-generated code, but adding support for the few missing instructions should be relatively straightforward.
QtMIPS, a MIPS simulator developed at CTU
- Allows simulating different variants of MIPS processors, including cache
- Provides integrated editor that allows editing and compiling MIPS assembly code
- To simulate single-cycle data path, use the following settings
  - Basic tab: no pipeline, no cache preset
  - Core tab: no delay slot

Lecture 7 (Mar 28, Mar 29)

Processor implementation
- Multi-cycle data path implementation
- Microcode controller for the multi-cycle data path
Hennessy & Patterson, Computer Organization and Design (3rd ed.)
- Chapter 5, section 5.5 (multicycle datapath)
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Appendix D, sections D.3 and D.4 (controller)
LogiSim: Multi-cycle MIPS data path implementation
- First 128 bytes of RAM are shadowed by ROM which contains the Bubble Sort program

Lecture 8 (Apr 4, Apr 5)

Processor implementation
- Wired controller for the multi-cycle data path
Improving performance through pipelining
- Implementing pipelined data path
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 4, sections 4.5 and 4.6

Lecture 9 (Apr 11, Apr 12)

Improving performance through pipelining
- Implementing pipelined data path and control
- Pipeline hazards (structural, data and control)
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 4, sections 4.6 to 4.8

Lecture 10 (Apr 18, Apr 19)

Issues in instruction pipelining
- Branch prediction, exceptions
- Static multiple issue (in-order super-scalar) pipeline
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 4, sections 4.8 and 4.9
LogiSim: Pipelined MIPS data path implementation
- Note that the Bubble Sort program needs to be modified for each variant of the pipeline, because the pipeline lacks the hazard detection unit (making truly a Microprocessor without Interlocked Pipeline Stages)
- Each variant has its own instruction memory containing the correct version of the Bubble Sort program
Bubble Sort for the pipelined data path
- Different variants of the Bubble Sort program for different pipeline variants

Lecture 11 (Apr 26, Apr 27)

Super-scalar pipelines
- Dynamic multiple issue (out-of-order super-scalar) pipeline
- Speculative execution, exception handling
Memory technology and memory hierarchy
- Introduction, temporal and spatial locality
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 4, sections 4.10, 4.11, 4.14, and 4.15
- Chapter 5, section 5.1

Lecture 12 (May 2, May 3)

Memory technology and memory hierarchy
- Static and dynamic memory technology
- Memory hierarchy concepts
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 5, sections 5.1, and 5.2
LogiSim: Static RAM model
- The static RAM circuit (memory_static_8x8bit) shows a row decoder and the organization of a 8x8 memory cell matrix, down to S-R flip-flops made of NOR gates.
- The memory cells use a controlled buffer to avoid driving the bit lines when not enabled by a word line, which allows connecting the cell outputs to a shared bit line.
Circuit: DRAM cell model
- Logical 0 corresponds to 0 Volts, logical 1 corresponds to 1 Volt
- Bit information (logical 0 or 1) is stored as charge in a capacitor (C_s)
- To read the value, the bit line (represented by capacitor C_bl) is precharged to 0.5 Volts (value in the middle between logical 0 and 1)
- When reading the information stored in C_s, we are looking for an upwards or downwards swing in voltage (or alternatively, in current) resulting from charge equalization between C_s and C_bl.
- The voltage (current) swing is picked up and aplified by a sense amplifier (not shown in the circuit)

Lecture 13 (May 9, May 10)

The Czech-language lecture is cancelled due to Rector’s day. Please review the material for this and the final lecture (including the linked videos) so that we can discuss the most important parts of both during the final lecture.

Cache architectures
- Direct-mapped cache architecture
- Set-associative cache architecture
- Fully associative cache architecture
- Architectural parameters (ABC) and their impact on cache performance
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 5, sections 5.3, 5.4, and 5.8
LogiSim: Cache models
- Includes direct-mapped (64 KiB), 4-way associative (64 KiB) and fully-associative (512 B) cache models for 32-bit addresses.
- All cache models use conceptually similar components to better demonstrate the commonality and the differences in their internal architecture.
- The cache models support either an update or a replacement of a cache line on write (think of replacement as setting the entire cache line to zero before storing the word being written).
Lecture #13 videos
- Czech
- English

Lecture 14 (May 16, May 17)

Cache architectures
- 3C model of cache misses
- Write-through (WT) and write-back (WB) caches
- Handling hits and misses in WT and WB caches
- Multi-level cache organization
Cache coherence
- Cache coherence problem in multi-core and multi-processor system
- Cache coherence protocols for WT and WB caches
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 5, sections 5.9, 5.10, 5.12 (part related to cache coherence), 5.14, 5.15, and 5.16
Lecture #14 videos
- Czech
- English