CSE 240C: Advanced Microarchitecture (Spring 2020)
Instructor:
Hadi Esmaeilzadeh
Email: hadi [AT] eng [DOT] ucsd [DOT] edu
Office: CSE 3228
TA:
Soroush Ghodrati: soghodra [AT] eng [DOT] ucsd [DOT] edu
Office hours: TBD
Syllabus
Schedule
Participation
Presentation
Critique
Projects
Schedule
Data
Topic
Reading Assignment
Alternative Reading Assignment
Presenter
0
03/30/2020
Introduction
Power Challenges May End the Multicore Era
Denard Scaling paper: Design of ion-implanted mosfet s with very small physical dimensions
Hadi Esmaeilzadeh
0
04/01/2020
Introduction
Power Challenges May End the Multicore Era
Denard Scaling paper: Design of ion-implanted mosfet s with very small physical dimensions
Hadi Esmaeilzadeh
1
04/06/2020
Unifying Machine Learning Accelerators
TABLA: A Unified Template-Based Framework for Accelerating Statistical Machine Learning
Towards a Unified Architecture for in-RDBMS Analytics
Optional:In-RDBMS Hardware Acceleration of Advanced Analytics
Divya Mahajan
2
04/08/2020
Machine Learning Accelerators
Scale-Out Acceleration for Machine Learning
Parallelized Stochastic Gradient Descent
Optional:An overview of gradient descent optimization algorithms
Hadi Esmaeilzadeh
3
04/13/2020
Neural Network History
Hadi Esmaeilzadeh
3
04/15/2020
No Lectures
Hadi Esmaeilzadeh
3
04/20/2020
Neural Network History
Hadi Esmaeilzadeh
3
04/22/2020
Deep Learning Accelerators
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
DaDianNao: A Machine-Learning Supercomputer
Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks
Optional: FPGAS vs GPUS in Datacenters
Alternative to DianNao: TETRIS: Scalable and Enfficient Neural Network Acceleration with 3D Memory
Alternative to Eyeriss: Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devicest
Hadi Esmaeilzadeh
4
04/27/2020
Scalable Deep Networks
Project Adam: building an efficient and scalable deep learning training system
Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture
Hadi Esmaeilzadeh
5
04/29/2020
Distributed Training
A Network-Centric Hardware/Algorithm Co-Design to Accelerate Distributed Training of Deep Neural Networks<
Hadi Esmaeilzadeh
6
05/04/2020
Variable Bitwidth DNN Acceleration
Stripes: Bit-Serial Deep Neural Network Computing
Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks
HAQ: Hardware-Aware Automated Quantization with Mixed Precision
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
Hadi Esmaeilzadeh
7
05/06/2020
Mixed-Signal DNN Acceleration
Mixed-Signal Charge-Domain Acceleration of Deep Neural networks through Interleaved Bit-Partitioned Arithmetic
Bit-Parallel Vector Composability for Neural Acceleration
Hadi Esmaeilzadeh
7
05/11/2020
Approximate computing
Neural Acceleration for General-Purpose Approximate Programs
General-Purpose Code Acceleration with Limited-Precision Analog Computation
Hadi Esmaeilzadeh
8
05/13/2020
Tiled/Dataflow Architectures
Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture
Understanding Sources of Inefficiency in General-Purpose Chips
Hadi Esmaeilzadeh
8
05/18/2020
Tiled/Dataflow Architectures
Exploiting ILP, TLP, and DLP with the Polymorphous TRIPS Architecture
Understanding Sources of Inefficiency in General-Purpose Chips
Hadi Esmaeilzadeh
9
05/20/2020
Tiled/Dataflow Architectures
The WaveScalar architecture
The raw microprocessor: A computational fabric for software circuits and general purpose programs
Hadi Esmaeilzadeh