Lectures | NSWI143

Lecture 1 (Feb 19, Feb 20)

Agenda
Introduction
- What’s below your program
- Abstraction as a tool
Computer performance
- Execution (response) time
- Throughput

Lecture 2 (Feb 26, Feb 27)

Computer performance
- Classic performance equation
- Execution time, clocks per instruction (CPI), clock rate
- Amdahl’s law
Instruction set architecture
- For review only (self-study)
- Henessy & Patterson, Computer Organization and Design (5th ed.)
  - Chapter 2, Instructions: Language of the Computer
Digital circuits
- Combinational and sequential circuits
- Logical functions and basic gates
- Fundamental operations: 1-bit addition

Lecture 3 (Mar 4, Mar 5)

Lecture cancelled (scheduling conflict with a research project meeting).

Lecture 4 (Mar 11, Mar 12)

Lecture cancelled (preventive measures to avoid the spread of corona virus).

Lecture 5 held on Zoom (Mar 18, Mar 19)

Digital circuits
- The elements of a simple ALU: n-bit addition, subtraction
- Simple operations: sign extension
- Sequential circuits: flip-flops, registers
- Sequential multiplication and division
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Appendix B, sections B.1 to B.3, B.5, B.7, B.8

Additional resources

Lecture videos

Lecture 6 held on Zoom (Mar 26, Mar 26)

Processor implementation
- Implementing highher-level blocks required for data path construction
  - 32-bit ALU built from 32 1-bit ALUs
  - 32-register file built from 4 8-register files
  - Simple circuits: multiplexers, decoders, sign/zero extension, zero detection
- Implementing a single-cycle data path
  - support for register-register, register-immediate, load/store, conditional branch, and absolute jump instructions
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 4, section 4.4

Additional resources

Single-cycle MIPS data path implementation
- For use with LogiSim Evolution version 3.3.0 or later
- The Instruction Memory (ROM) contains the following Bubble Sort program
Bubble Sort
- Sorts 16 integers starting at address 0
QtMIPS, a MIPS simulator developed at CTU
- Allows simulating different variants of MIPS processors, including cache
- Provides integrated editor that allows editing and compiling MIPS assembly code
- To simulate single-cycle data path, use the following settings
  - Basic tab: no pipeline, no cache preset
  - Core tab: no delay slot
Lecture videos

Lecture 7 held on Zoom (Apr 1, Apr 2)

Processor implementation
- Implementing controller for the single-cycle data path
- Implementing multi-cycle data path
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Appendix D, sections D.1 and D.2 (controller)
Hennessy & Patterson, Computer Organization and Design (3rd ed.)
- Chapter 5, section 5.5 (multicycle datapath)

Additional resources

Lecture videos

Lecture 8 held on Zoom (Apr 8, Apr 9)

Processor implementation
- Implementing controller for the multi-cycle data path
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Appendix D, sections D.3 and D.4 (controller)
Hennessy & Patterson, Computer Organization and Design (3rd ed.)
- Chapter 5, section 5.5 (multicycle datapath)

Additional resources

Multi-cycle MIPS data path implementation
- First 128 bytes of RAM are shadowed by ROM which contains the following Bubble Sort program
- For use with LogiSim Evolution version 3.3.0 or later
Bubble Sort version 2
- Sorts 16 integers starting at address 0x80 (instead of address 0)
- This is because code and data need to be in the same memory and code needs to start at address 0.
Updated single-cycle MIPS data path implementation
- The stdlib.circ and mipslib.circ are shared by both data path implementations
- The Instruction Memory (ROM) contains Bubble Sort version 2
Lecture videos

Lecture 9 held on Zoom (Apr 15, Apr 16)

Improving performance through pipelining
- Implementing pipelined data path and control
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 4, sections 4.5 and 4.6

Additional resources

Lecture videos

Lecture 10 held on Zoom (Apr 22, Apr 23)

Issues in instruction pipelining
- Pipeline hazards, branch prediction, exceptions
- Static multiple issue (super-scalar) pipeline
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 4, sections 4.7 to 4.10

Additional resources

Pipelined MIPS data path implementation
- For use with LogiSim Evolution version 3.3.0 or later
- Note that the Bubble Sort program needs to be modified for each variant of the pipeline, because the pipeline lacks the hazard unit (which makes it truly a Microprocessor without Interlocked Pipeline Stages)
- Each variant has its own instruction memory containing the correct version of the Bubble Sort program
Bubble Sort version 3
- Different variants of the Bubble Sort program for different pipeline variants
- Also contains a C version compiled by GCC at different optimization levels into assembly and object files (the current implementation is not able to execute the GCC-generated code, but adding support for the few missing instructions should be relatively straightforward–at least in the single-cycle datapath).
Updated single-cycle MIPS data path implementation
- Includes private instruction memory with code intended for the single-cycle datapath
- Updated stdlib.circ and mipslib.circ shared by all datapath implementations
Updated multi-cycle MIPS data path implementation
- Includes private instruction memory with code intended for the multi-cycle datapath (identical to single-cycle)
- Updated stdlib.circ and mipslib.circ shared by all datapath implementations
Lecture videos

Lecture 11 held on Zoom (Apr 29, Apr 30)

Super-scalar pipelines
- Static multiple issue (in-order super-scalar) pipeline
- Dynamic multiple issue (out-of-order super-scalar) pipeline
- Speculative execution, exception handling
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 4, sections 4.10, 4.11, 4.14, and 4.15

Additional resources

Lecture videos

Lecture 12 held on Zoom (May 6, May 7)

Memory technology and memory hierarchy
- Static and dynamic memory technology
- Memory hierarchy concepts
- Direct-mapped cache
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 5, sections 5.1, 5.2, and 5.3

Additional resources

DRAM cell example
- Logical 0 corresponds to 0 Volts, logical 1 corresponds to 1 Volt
- Bit information (logical 0 or 1) is stored as charge in a capacitor (C_s)
- To read the value, the bit line (represented by capacitor C_bl) is precharged to 0.5 Volts (value in the middle between logical 0 and 1)
- When reading the information stored in C_s, we are looking for an upwards or downwards swing in voltage (or alternatively, in current) resulting from charge equalization between C_s and C_bl.
- The voltage (current) swing is picked up and aplified by a sense amplifier (not shown in the circuit)
Static RAM and direct-mapped cache model
- Static RAM circuit (memory_static_8x8bit), shows row decoder and the organization of a 8x8 memory cell matrix, down to S-R flip-flops made of NOR gates
- Direct-mapped cache circuit (cache_direct_mapped_64k), shows organization of a 64 KiB direct-mapped cache (64 B cache lines) for 32-bit address space.
- For use with LogiSim Evolution version 3.3.0 or later
Lecture videos

Lecture 13 held on Zoom (May 13, May 14)

Cache architectures
- Set-associative cache architecture
- Fully associative cache architecture
- Cache-miss classification (3C model) and cache performance
- Architectural parameters (ABC) and their impact on cache performance
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 5, sections 5.4, and 5.8

Additional resources

Updated Static RAM and cache models (version 2)
- The memory cell in the static RAM circuit (memory_static_8x8bit) now uses a controlled buffer to avoid driving the bit lines when not enabled by a word line. This avoids electrical issues where multiple cells were driving the bit line with opposite values.
- Includes direct-mapped (64 KiB), 4-way associative (64 KiB) and fully-associative (512 B) cache models for 32-bit addresses. All cache models use conceptually similar components to better demonstrate the commonality and differences in their internal architecture.
- For use with LogiSim Evolution version 3.3.0 or later
Updated Static RAM and cache models (version 3)
- The models of static memory and cache architectures have been split into separate files.
- The cache models now support either update or replacement of a cache line.
- The cache models have been refactored to look similar, the only differences being the top-level architecture and the internals of the data storage components.
Lecture videos

Lecture 14 held on Zoom (May 20, May 21)

Cache coherence
- Write-through (WT) and write-back (WB) caches
- Handling hits and misses in WT and WB caches
- Cache coherence problem in multi-core and multi-processor system
- Cache coherence protocols for WT and WB caches
Hennessy & Patterson, Computer Organization and Design (5th ed.)
- Chapter 5, sections 5.9, 5.10, 5.12 (part related to cache coherence), 5.14, 5.15, and 5.16