# CS184a: Computer Architecture (Structures and Organization)

Day4: October 4, 2000 Memories, ALUs, and Virtualization

Caltech CS184a Fall2000 -- DeHon

1

#### Last Time

- Arithmetic: addition, subtraction
- Reuse:
  - pipelining
  - bit-serial (vectorization)
  - shared datapath elements
- FSMDs
- Area/Time Tradeoffs
- Latency and Throughput

Caltech CS184a Fall2000 -- DeHon

# Today

- Memory
  - features
  - design
  - $-\ technology$
  - impact on computability
- ALUs
- Virtualization

Caltech CS184a Fall2000 -- DeHon

3

# Memory

- What's a memory?
- What's special about a memory?

Caltech CS184a Fall2000 -- DeHon



# Memory

- Block for storing data for later retrieval
- State element
- What's different between a memory and a collection of registers like we've been discussing?

Caltech CS184a Fall2000 -- DeHon



# Memory Uniqueness

- Cost
- Compact state element
- Packs data very tightly
- At the expense of sequentializing access
- Example of Area-Time tradeoff
  - and a key enabler

Caltech CS184a Fall2000 -- DeHon

# **Memory Organization**

- Key idea: sharing
  - factor out common components among state elements
  - can have big, elements if amortize costs
  - state element unique -> small Memory bit cell □



Caltech CS184a Fall2000 -- DeHon

0

# **Memory Organization**

- Share: Interconnect
  - Input bus
  - Output bus
  - Control routing
- VERY topology/wire cost aware design
- Note local, abuttment wiring



Caltech CS184a Fall2000 -- DeHon

# **Share Interconnect**

- Input Sharing
  - wiring
  - drivers
- Output Sharing
  - wiring
  - sensing
  - driving



Caltech CS184a Fall2000 -- DeHon

11

# Address/Control

- Addressing and Control
  - an overhead
  - paid to allow this sharing



Caltech CS184a Fall2000 -- DeHon



# Dynamic RAM

• Goes a step further

Caltech CS184a Fall2000

- Share refresh/restoration logic as well
- Minimal storage is a capacitor
- "Feature" DRAM process is ability to make capacitors efficiently





### Some Numbers (memory)

- Unit of area =  $\lambda^2$ 
  - [more next time]
- Register as stand-alone element  $\approx 4K\lambda^2$ 
  - -e.g. as needed/used last two lectures
- Static RAM cell  $\approx 1 \text{K}\lambda^2$ 
  - SRAM Memory (single ported)
- Dynamic RAM cell (DRAM process)  $\approx 100\lambda^2$
- Dynamic RAM cell (SRAM process)  $\approx 300\lambda^2$

Caltech CS184a Fall2000 -- DeHon

15

### Memory

- Key Idea
  - Memories hold state compactly
  - Do so by minimizing key state storage and amortizing rest of structure across large array

Caltech CS184a Fall2000 -- DeHon



- Width
- Depth
- Internal vs. External Width



Caltech CS184a Fall2000 -- DeHon

1/

# System Memory Design

- Have a memory capacity to provide
- What are choices?

Caltech CS184a Fall2000 -- DeHon

# System Memory Design

- One monolithic memory?
  - Internal vs. external width
  - internal banking
- External width
- Separate memory banks (address ports)

Caltech CS184a Fall2000 -- DeHon

19

# Yesterday vs. Today (Memory Technology)

• What's changed?

Caltech CS184a Fall2000 -- DeHon

# Yesterday vs. Today (Memory Technology)

- What's changed?
  - Capacity
    - single chip
  - Integration
    - memory and logic
    - dram and logic
    - embedded memories
  - Room on chip for big memories
  - Don't have to make a chip crossing to get to memory

memory Caltech CS184a Fall2000 -- DeHon

21

# Important Technology Cost

- IO between chips << IO on chip
  - pad spacing
  - area vs. perimeter (4s vs. s<sup>2</sup>)
  - wiring technology
- BIG factor in multi-chip system designs
- Memories nice
  - very efficient with IO cost vs. internal area

Caltech CS184a Fall2000 -- DeHon

# Costs Change

- Design space changes when whole system goes on single chip
- · Can afford
  - wider busses
  - more banks
  - memory tailored to application/architecture
- Beware of old (stale) answers
  - their cost model was different

Caltech CS184a Fall2000 -- DeHon

23

# What is Importance of Memory?

- Radical Hypothesis:
  - Memory is simply a very efficient organization which allows us to store data compactly
    - (at least, in the technologies we've seen to date)
  - A great engineering trick to optimize resources
- Alternative:
  - memory is a **primary**

Caltech CS184a Fall2000 -- DeHon

# Sharing

Caltech CS184a Fall2000 -- DeHon

25

### Last Time

- Given a task:  $y=Ax^2+Bx+C$
- Saw how to share primitive operators
- Got down to one of each



Caltech CS184a Fall2000 -- DeHon

# Very naively

• Might seem we need one of each different type of operator

Caltech CS184a Fall2000 -- DeHon

27

#### ..But

- Doesn't fool us
- We already know that nand gate (and many other things) are universal
- So, we know, we can build a universal compute operator

Caltech CS184a Fall2000 -- DeHon

# This Example

- $y=Ax^2+Bx+C$
- Know a single adder will do



Caltech CS184a Fall2000 -- DeHon

29

#### Adder Universal?

- Assuming interconnect:
  - (big assumption as we'll see later)
  - Consider:

A: 001a

B: 000b

S: 00cd



• What's c?



Caltech CS184a Fall2000 -- DeHon

# Practically

- To reduce (some) interconnect
- and to reduce number of operations
- do tend to build a bit more general "universal" computing function

Caltech CS184a Fall2000 -- DeHon

31

# Arithmetic Logic Unit (ALU)

- Observe:
  - with small tweaks can get many functions with basic adder components



Caltech CS184a Fall2000 -- DeHon





## **Table Lookup Function**

- Observe 2: only  $2^{2^3}$ =256 functions of 3 inputs
  - -3-inputs = A, B, carry in from lower
- Two, 3-input Lookup Tables
  - give all functions of 2-inputs and a cascade
  - 8b to specify function of each lookup table
- LUT = LookUp Table

Caltech CS184a Fall2000 -- DeHon

35

#### What does this mean?

- With only one active component
  - ALU, nand gate, LUT
- Can implement any function
  - given appropriate
    - state registers
    - muxes (interconnect)
    - control

Caltech CS184a Fall2000 -- DeHon

# Revisit Example



• We do see a proliferation of memory and muxes -- what do we do about that?

Caltech CS184a Fall2000 -- DeHon

37

#### **Back to Memories**

- State in memory more compact than "live" registers
  - shared input/output/drivers
- If we're sequentializing, only need one (few) at a time anyway
  - I.e. sharing compute unit, might as well share interconnect
- Shared interconnect also gives muxing function

Caltech CS184a Fall2000 -- DeHon





#### Control

- Still need that controller which directed which state, went where, and when
- Has more work now,
  - also say what operations for compute unit



Caltech CS184a Fall2000 -- DeHon

41

# **Implementing Control**

- Implementing a single, Fixed computation
  - might still just build a custom FSM



Caltech CS184a Fall2000 -- DeHon

# ...and Programmable

- At this point, it's a small leap to say maybe the controller can be programmable as well
- Then have a building block which can implement anything

within state and control programmability bounds



Caltech CS184a Fall2000 -- DeHon

43

# Simplest Programmable Control

- Use a memory to "record" control instructions
- "Play" control with sequence



Caltech CS184a Fall2000 -- DeHon





#### What have we done?

- Taken a computation:  $y=Ax^2+Bx+C$
- Turned it into operators and interconnect



• Decomposed operators into a basic primitive: Additions, ALU, ...nand

Caltech CS184a Fall2000 -- DeHon

47

#### What have we done?

- Said we can implement it on as few as one of these
- Added a unit for state



• Added an instruction to tell single, universal unit how to act as each operator in original graph

Caltech CS184a Fall2000 -- DeHon

#### Virtualization

- We've virtualized the computation
- No longer need one physical compute unit for each operator in original computation
- Can suffice with shared operator(s)
- and a description of how each operator behaved
- and a place to store the intermediate data between operators

Caltech CS184a Fall2000 -- DeHon

49



# Why Interesting?

- Memory compactness
- This works and was interesting because
  - the area to describe a computation, its interconnect, and its state
  - is much smaller than the physical area to spatially implement the computation
- e.g. traded multiplier for
  - few memory slots to hold state
  - few memory slots to describe operation

- time on a shared unit (ALU)

51

### Finishing up

- Coming Attractions
- Administrivia
- Big Ideas
  - MSB
  - MSB-1

Caltech CS184a Fall2000 -- DeHon

# Coming Attractions: Three Talks by Tom Knight

- Thursday 4pm (102 Steele)
  - Robust Computation with Capabilities and Data
     Ownership (computer architecture)
- This Fri 4pm (102 Steele)
  - Reversibility in Digital, Analog, and Neural
     Computation (physics of computation)
- Next Mon 3pm (Beckman Institute Auditorium)
  - Computing with Life (biological computers)

Caltech CS184a Fall2000 -- DeHon

53

#### Administrative

- CS184 mailing list -- sent test message
  - if didn't receive, you should mail me
- CS184 questions:
  - please put CS184 in the subject line
- Homework Instructions:
  - read the info handout!
- Course web page
- Comment Reading

Caltech CS184a Fall2000 -- DeHon

#### Caltech CS

- Want to talk with undergrads about
  - department
  - classes
  - great CS community
  - ...concentration/major...
- Fora (plural of forum?)
  - small group discussion in houses?
  - small group dinner
  - **-** ???

Caltech CS184a Fall2000 -- DeHon

55

# Big Ideas [MSB Ideas]

- Memory: efficient way to hold state
- State can be << computation [area]
- Resource sharing: key trick to reduce area
- Memories are a great example of resource sharing
- Memory key tool for Area-Time tradeoffs
- "configuration" signals allow us to generalize the utility of a computational operator

operator Caltech CS184a Fall2000 -- DeHon

# Big Ideas [MSB-1 Ideas]

- Tradeoffs in memory organization
- Changing cost of memory organization as we go to on-chip, embedded memories
- ALUs and LUTs as universal compute elements
- First programmable computing unit

Caltech CS184a Fall2000 -- DeHon