Lecture 1 (Feb 19, Feb 20)

Lecture 2 (Feb 26, Feb 27)

  • Computer performance
    • Classic performance equation
    • Execution time, clocks per instruction (CPI), clock rate
    • Amdahl’s law
  • Instruction set architecture
    • For review only (self-study)
    • Henessy & Patterson, Computer Organization and Design (5th ed.)
      • Chapter 2, Instructions: Language of the Computer
  • Digital circuits
    • Combinational and sequential circuits
    • Logical functions and basic gates
    • Fundamental operations: 1-bit addition

Lecture 3 (Mar 4, Mar 5)

  • Lecture cancelled (scheduling conflict with a research project meeting).

Lecture 4 (Mar 11, Mar 12)

  • Lecture cancelled (preventive measures to avoid the spread of corona virus).

Lecture 5 held on Zoom (Mar 18, Mar 19)

  • Digital circuits
    • The elements of a simple ALU: n-bit addition, subtraction
    • Simple operations: sign extension
    • Sequential circuits: flip-flops, registers
    • Sequential multiplication and division
  • Hennessy & Patterson, Computer Organization and Design (5th ed.)
    • Appendix B, sections B.1 to B.3, B.5, B.7, B.8

Additional resources

Lecture 6 held on Zoom (Mar 26, Mar 26)

  • Processor implementation
    • Implementing highher-level blocks required for data path construction
      • 32-bit ALU built from 32 1-bit ALUs
      • 32-register file built from 4 8-register files
      • Simple circuits: multiplexers, decoders, sign/zero extension, zero detection
    • Implementing a single-cycle data path
      • support for register-register, register-immediate, load/store, conditional branch, and absolute jump instructions
  • Hennessy & Patterson, Computer Organization and Design (5th ed.)
    • Chapter 4, section 4.4

Additional resources

  • Single-cycle MIPS data path implementation
    • For use with LogiSim Evolution version 3.3.0 or later
    • The Instruction Memory (ROM) contains the following Bubble Sort program
  • Bubble Sort
    • Sorts 16 integers starting at address 0
  • QtMIPS, a MIPS simulator developed at CTU
    • Allows simulating different variants of MIPS processors, including cache
    • Provides integrated editor that allows editing and compiling MIPS assembly code
    • To simulate single-cycle data path, use the following settings
      • Basic tab: no pipeline, no cache preset
      • Core tab: no delay slot
  • Lecture videos

Lecture 7 held on Zoom (Apr 1, Apr 2)

  • Processor implementation
    • Implementing controller for the single-cycle data path
    • Implementing multi-cycle data path
  • Hennessy & Patterson, Computer Organization and Design (5th ed.)
    • Appendix D, sections D.1 and D.2 (controller)
  • Hennessy & Patterson, Computer Organization and Design (3rd ed.)
    • Chapter 5, section 5.5 (multicycle datapath)

Additional resources

Lecture 8 held on Zoom (Apr 8, Apr 9)

  • Processor implementation
    • Implementing controller for the multi-cycle data path
  • Hennessy & Patterson, Computer Organization and Design (5th ed.)
    • Appendix D, sections D.3 and D.4 (controller)
  • Hennessy & Patterson, Computer Organization and Design (3rd ed.)
    • Chapter 5, section 5.5 (multicycle datapath)

Additional resources

Lecture 9 held on Zoom (Apr 15, Apr 16)

Additional resources

Lecture 10 held on Zoom (Apr 22, Apr 23)

  • Issues in instruction pipelining
    • Pipeline hazards, branch prediction, exceptions
    • Static multiple issue (super-scalar) pipeline
  • Hennessy & Patterson, Computer Organization and Design (5th ed.)
    • Chapter 4, sections 4.7 to 4.10

Additional resources

  • Pipelined MIPS data path implementation
    • For use with LogiSim Evolution version 3.3.0 or later
    • Note that the Bubble Sort program needs to be modified for each variant of the pipeline, because the pipeline lacks the hazard unit (which makes it truly a Microprocessor without Interlocked Pipeline Stages)
    • Each variant has its own instruction memory containing the correct version of the Bubble Sort program
  • Bubble Sort version 3
    • Different variants of the Bubble Sort program for different pipeline variants
    • Also contains a C version compiled by GCC at different optimization levels into assembly and object files (the current implementation is not able to execute the GCC-generated code, but adding support for the few missing instructions should be relatively straightforward–at least in the single-cycle datapath).
  • Updated single-cycle MIPS data path implementation
    • Includes private instruction memory with code intended for the single-cycle datapath
    • Updated stdlib.circ and mipslib.circ shared by all datapath implementations
  • Updated multi-cycle MIPS data path implementation
    • Includes private instruction memory with code intended for the multi-cycle datapath (identical to single-cycle)
    • Updated stdlib.circ and mipslib.circ shared by all datapath implementations
  • Lecture videos

Lecture 11 held on Zoom (Apr 29, Apr 30)

  • Super-scalar pipelines
    • Static multiple issue (in-order super-scalar) pipeline
    • Dynamic multiple issue (out-of-order super-scalar) pipeline
    • Speculative execution, exception handling
  • Hennessy & Patterson, Computer Organization and Design (5th ed.)
    • Chapter 4, sections 4.10, 4.11, 4.14, and 4.15

Additional resources

Lecture 12 held on Zoom (May 6, May 7)

  • Memory technology and memory hierarchy
    • Static and dynamic memory technology
    • Memory hierarchy concepts
    • Direct-mapped cache
  • Hennessy & Patterson, Computer Organization and Design (5th ed.)
    • Chapter 5, sections 5.1, 5.2, and 5.3

Additional resources

  • DRAM cell example
    • Logical 0 corresponds to 0 Volts, logical 1 corresponds to 1 Volt
    • Bit information (logical 0 or 1) is stored as charge in a capacitor (Cs)
    • To read the value, the bit line (represented by capacitor Cbl) is precharged to 0.5 Volts (value in the middle between logical 0 and 1)
    • When reading the information stored in Cs, we are looking for an upwards or downwards swing in voltage (or alternatively, in current) resulting from charge equalization between Cs and Cbl.
    • The voltage (current) swing is picked up and aplified by a sense amplifier (not shown in the circuit)
  • Static RAM and direct-mapped cache model
    • Static RAM circuit (memory_static_8x8bit), shows row decoder and the organization of a 8x8 memory cell matrix, down to S-R flip-flops made of NOR gates
    • Direct-mapped cache circuit (cache_direct_mapped_64k), shows organization of a 64 KiB direct-mapped cache (64 B cache lines) for 32-bit address space.
    • For use with LogiSim Evolution version 3.3.0 or later
  • Lecture videos

Lecture 13 held on Zoom (May 13, May 14)

  • Cache architectures
    • Set-associative cache architecture
    • Fully associative cache architecture
    • Cache-miss classification (3C model) and cache performance
    • Architectural parameters (ABC) and their impact on cache performance
  • Hennessy & Patterson, Computer Organization and Design (5th ed.)
    • Chapter 5, sections 5.4, and 5.8

Additional resources

  • Updated Static RAM and cache models (version 2)
    • The memory cell in the static RAM circuit (memory_static_8x8bit) now uses a controlled buffer to avoid driving the bit lines when not enabled by a word line. This avoids electrical issues where multiple cells were driving the bit line with opposite values.
    • Includes direct-mapped (64 KiB), 4-way associative (64 KiB) and fully-associative (512 B) cache models for 32-bit addresses. All cache models use conceptually similar components to better demonstrate the commonality and differences in their internal architecture.
    • For use with LogiSim Evolution version 3.3.0 or later
  • Updated Static RAM and cache models (version 3)
    • The models of static memory and cache architectures have been split into separate files.
    • The cache models now support either update or replacement of a cache line.
    • The cache models have been refactored to look similar, the only differences being the top-level architecture and the internals of the data storage components.
  • Lecture videos

Lecture 14 held on Zoom (May 20, May 21)

  • Cache coherence
    • Write-through (WT) and write-back (WB) caches
    • Handling hits and misses in WT and WB caches
    • Cache coherence problem in multi-core and multi-processor system
    • Cache coherence protocols for WT and WB caches
  • Hennessy & Patterson, Computer Organization and Design (5th ed.)
    • Chapter 5, sections 5.9, 5.10, 5.12 (part related to cache coherence), 5.14, 5.15, and 5.16