# CS184a: Computer Architecture (Structure and Organization)

Day 15: February 12, 2003 Interconnect 5: Meshes



Caltech CS184 Winter2003 -- DeHon

### **Previous**

- Saw we needed to exploit locality/structure in interconnect
- Saw a mesh might be useful
  - Question: how does w grow?
- Saw Rent's Rule as a way to characterize structure

### Today

- Mesh:
  - Channel width bounds
  - Linear population
  - Switch requirements
  - Routability
  - Segmentation
  - Clusters
  - Commercial

Caltech CS184 Winter2003 -- DeHon



### Mesh Channels

- · Lower Bound on w?
- Bisection Bandwidth
  - $\; BW \propto N^p$
  - N<sup>0.5</sup> channels in bisection

$$W \propto \sqrt[N]{N} = N^{(p-0.5)}$$



5

Caltech CS184 Winter2003 -- DeHon

# Straight-forward Switching Requirements

- Total Switches?
- Switching Delay?



6

### Switch Delay

• Switching Delay: 2  $\sqrt{(N_{subarray})}$ 

– worst case: N<sub>subarray</sub> = N



Caltech CS184 Winter2003 -- DeHon

### **Total Switches**

- Switches per switchbox:
  - $-43w \times w / 2 = 6w^2$
  - Bidirectional switches
    - (N→W same as W→N)
    - · double count



### **Total Switches**

- Switches per switchbox:
  - $-43w \times w / 2 = 6w^2$
- Switches into network:
  - -(K+1) w
- Switches per PE:
  - $-6w^2 + (K+1) w$
  - $w = cN^{p-0.5}$
  - Total  $\propto$  N<sup>2p-1</sup>



Caltech CS184 Winter2003 -- DeHon



9

### Routability?

- Asking if you can route in a given channel width is:
  - NP-complete

### Traditional Mesh Population

 Switchbox contains only a linear number of switches in channel width



Caltech CS184 Winter2003 -- DeHon

### Linear Mesh Switchbox

- Each entering channel connect to:
  - One channel on each remaining side (3)
  - -4 sides
  - W wires
  - Bidirectional switches
    - (N→W same as W→N)
    - · double count
  - 3×4×W/2=6W switches
    - vs. 6w<sup>2</sup> for full population



12

### **Total Switches**

- Switches per switchbox:
  - -6w
- Switches into network:
  - -(K+1) w
- Switches per PE:
  - -6w + (K+1) w
  - $w = cN^{p-0.5}$
  - Total  $\propto$  N<sup>p-0.5</sup>

Caltech CS184 Winter2003 -- DeHon



13

### **Total Switches**

- Total Switches
  - $\propto N^{p+0.5} > N$
  - $\propto N^{p+0.5} < N^{2p}$
- Switches grow faster than nodes
- Wires grow faster than switches



14

### **Checking Constants**

- Wire pitch =  $8\lambda$
- switch area = 2500  $\lambda^2$
- wire area: (8w)<sup>2</sup>
- switch area: 6×2500 w
- crossover
  - -w=234?
  - (practice smaller)

Caltech CS184 Winter2003 -- DeHon

15

# Checking Constants: Full Population

- Wire pitch =  $8\lambda$
- switch area = 2500  $\lambda^2$
- wire area:  $(8w)^2$
- switch area: 6×2500 w<sup>2</sup>
- effective wire pitch:

120 λ

~15 times pitch



### **Practical**

- · Just showed:
  - would take 15× Mapping Ratio for linear population to take same area as full population (once crossover to wire dominated)
- Can afford to not use some wires perfectly
  - to reduce switches

17

Caltech CS184 Winter2003 -- DeHon

### **Diamond Switch**

- Typical switchbox pattern:
  - Used by Xilinx
- Many less switches, but cannot guarantee will be able to use all the wires
  - may need more wires than implied by Rent, since cannot use all wires
  - this was already true...now more so



 Once enter network (choose color) can only switch within domain



Caltech CS184 Winter2003 -- DeHon

### **Universal SwitchBox**

- · Same number of switches as diamond
- Locally: can guarantee to satisfy any set of requests
  - request = direction through swbox
  - as long as meet channel capacities
  - and order on all channels irrelevant
  - can satisfy
- Not a global property
  - no guarantees between swboxes



### Diamond vs. Universal?

 Universal routes strictly more configurations Universal

Xilinx





can't route (or rotations)

21

Caltech CS184 Winter2003 -- DeHon

### **Inter-Switchbox Constraints**

- Channels connect switchboxes
- For valid route, must satisfy all adjacent switchboxes



### Mapping Ratio?

- · How bad is it?
- How much wider do channels have to be?
- Mapping Ratio:
  - detail channel width required / global ch width

23

Caltech CS184 Winter2003 -- DeHon

### Mapping Ratio

- Empirical:
  - Seems plausible, constant in practice
- Theory/provable:
  - There is no Constant Mapping Ratio
    - At least detail/global
  - can be arbitrarily large!

### **Domain Structure**

 Once enter network (choose color) can only switch within domain



Caltech CS184 Winter2003 -- DeHon

### **Detail Routing as Coloring**



Caltech CS184 Winter2003 -- DeHon

### **Detail Routing as Coloring**



- Global Route channel width = 2
- Detail Route channel width = N
  - Can make arbitrarily large difference

Caltech CS184 Winter2003 -- DeHon

27

# Detail Routing as Coloring Caltech CS184 Winter2003 - DeHon

### Routability

- Domain Routing is NP-Complete
  - can reduce coloring problem to domain selection
    - i.e. map adjacent nodes to same channel
    - Previous example shows basic shape
  - (another reason routers are slow)

29

Caltech CS184 Winter2003 -- DeHon

### Routing

- Lack of detail/global mapping ratio
  - Says detail can be arbitrarily worse than global
  - Say global not necessarily predict detail
  - Argument against decomposing mesh routing into global phase and detail phase
    - Modern FPGA routers do not

### Segmentation

- To improve speed (decrease delay)
- Allow wires to bypass switchboxes
- Maybe save switches?
- Certainly cost more wire tracks

Caltech CS184 Winter2003 -- DeHon



### Segmentation

- Segment of Length L<sub>seg</sub>
  - 6 switches per switchbox visited
  - Only enters a switchbox every L<sub>seq</sub>
  - SW/sbox/track of lengthLseg = 6/L<sub>seg</sub>



### Segmentation

- Reduces switches on path  $\sqrt{N/L_{seq}}$
- May get fragmentation
- Another cause of unusable wires



Caltech CS184 Winter2003 -- DeHor

## Segmentation: Corner Turn Option

- Can you corner turn in the middle of a segment?
- If can, need one more switch
- SW/sbox/track = 5/L<sub>seg</sub> + 1





Caltech CS 184 winterzuus -- Demon





### C-Box Depopulation

- Not necessary for every input to connect to every channel
- · Saw last time:
  - K×(N-K+1) switches
- Maybe use less?



37

Caltech CS184 Winter2003 -- DeHon

### **IO** Population

- Toronto Model
  - Fc fraction of tracks which an input connects to
- IOs spread over 4 sides
- Maybe show up on multiple
  - Shown here: 2





### **Leaves Not LUTs**

- Recall cascaded LUTs
- Often group collection of LUTs into a Logic Block









### Mesh Design Parameters

- Cluster Size
  - Internal organization
- LB IO (Fc, sides)
- Switchbox Population and Topology
- · Segment length distribution
- · Switch rebuffering

### **Commercial Parts**

45







### Virtex II Interconnect Resources



Figure 49: Hierarchical Routing Resources

49

Caltech CS184 Wirnerzooo -- Demoir

### Big Ideas [MSB Ideas]

- Mesh natural 2D topology
  - Channels grow as  $\Omega(N^{p-0.5})$
  - Wiring grows as  $\Omega(N^{2p})$
  - Linear Population:
    - Switches grow as Ω(N<sup>p+0.5</sup>)
    - Unbounded global → detail mapping ratio
    - Detail routing NP-complete

### Big Ideas [MSB-1 Ideas]

- Segmented/bypass routes
  - can reduce switching delay
  - costs more wires (fragmentation of wires)

51