Nt1330 Unit 5

The Haswell microarchitecture improvises the Ivy-bridge in the following ways:
• New socket LGA1150 for desktop processors
• New AVX2 Instruction set
• New TSX Instruction set
• New bit manipulation instructions
• New dispatch ports
INSTRUCTION SET:
The AVX instruction set limited the integer operations to use only 128-bit registers whereas the floating point operations used 256-bit registers. In addition, AVX2 implements three operand FMA instructions. New 15 bit manipulation instructions were added to support cryptography, indexing and data conversion. The third new instruction set is TSX which helps resolve data synchronization issues which arises when same data are used by different processes running at the same time.
BIT MANIPULATION …show more content…
BRANCH PREDICTION: Image ref: http://www.xbitlabs.com/articles/cpu/display/haswell-uarch-idf_3.html#sect0
All the changes are consolidated in the core front-end. The execution pipeline remained the same, the L1 and L2 cache latencies also haven’t changed. However, Haswell boasts improved branch prediction, larger L2 TLB, larger buffers and Out-of-Order Window.

EXECUTION UNITS:
Previously, Intel had 6 dispatch ports that connects the reservation station to the execution engine. Haswell added two more dispatch ports thereby increasing the number of instructions the reservation station sends to the execution unit by 33%. Haswell also has 17 execution units whereas Ivy-bridge had just 15. An important enhancement in the Haswell is the 256-bit data path between the RS and execution engine as opposed to the 128-bit data path in Ivy-bridge. Image ref: http://www.xbitlabs.com/articles/cpu/display/haswell-uarch-idf_3.html#sect0
FMA &PEAK FLOPS: Image ref: …show more content…
The internal cache structure and cache size remain the same but the bandwidth’s have been modified. This was to ensure the cache speed is adequate enough to run the AVX2 instructions execution in the core. The read & write ports in haswell L1 cache are 256-bit wide so that implies that we can perform two reads and one write per clock . Restrictions around banking have been eliminated. The L2 cache bus is wider so it can receive upto 64 bytes of data/clock cycle which is twice that of Ivy bridge. The improvements only deal with B.W. while the latency is the

Related Documents

Nt1310 Unit 3 Components

Nt1310 Unit 3 Components

Nt1330 Unit 3 Assignment 1 D1310

Nt1330 Unit 3 Assignment 1 D1310

Nt1310 Unit 6 Lab

Nt1310 Unit 6 Lab

Nt1310 Unit 4 Lab 4

Nt1310 Unit 4 Lab 4

Nt1330 Unit 7

Nt1330 Unit 7

Veterinary Office Management Essay

Veterinary Office Management Essay

Pc 517 Project

Pc 517 Project

Xeon E5520 Unit 3

Xeon E5520 Unit 3

Alienware Research Paper

Alienware Research Paper

Nt1310 Unit 8 Central Processing Unit

Nt1310 Unit 8 Central Processing Unit

Canny Wheelchairs

Canny Wheelchairs

Importance Of Instruction Scheduling

Importance Of Instruction Scheduling

My Professor's Definition Of A Computer

My Professor's Definition Of A Computer

Analysis: Arduino Uno Board

Analysis: Arduino Uno Board

Flip Flop Case Study

Flip Flop Case Study

Related Topics

Ready To Get Started?

Discover

Company

Follow