• New socket LGA1150 for desktop processors
• New AVX2 Instruction set
• New TSX Instruction set
• New bit manipulation instructions
• New dispatch ports
INSTRUCTION SET:
The AVX instruction set limited the integer operations to use only 128-bit registers whereas the floating point operations used 256-bit registers. In addition, AVX2 implements three operand FMA instructions. New 15 bit manipulation instructions were added to support cryptography, indexing and data conversion. The third new instruction set is TSX which helps resolve data synchronization issues which arises when same data are used by different processes running at the same time.
BIT MANIPULATION …show more content…
BRANCH PREDICTION: Image ref: http://www.xbitlabs.com/articles/cpu/display/haswell-uarch-idf_3.html#sect0
All the changes are consolidated in the core front-end. The execution pipeline remained the same, the L1 and L2 cache latencies also haven’t changed. However, Haswell boasts improved branch prediction, larger L2 TLB, larger buffers and Out-of-Order Window.
EXECUTION UNITS:
Previously, Intel had 6 dispatch ports that connects the reservation station to the execution engine. Haswell added two more dispatch ports thereby increasing the number of instructions the reservation station sends to the execution unit by 33%. Haswell also has 17 execution units whereas Ivy-bridge had just 15. An important enhancement in the Haswell is the 256-bit data path between the RS and execution engine as opposed to the 128-bit data path in Ivy-bridge. Image ref: http://www.xbitlabs.com/articles/cpu/display/haswell-uarch-idf_3.html#sect0
FMA &PEAK FLOPS: Image ref: …show more content…
The internal cache structure and cache size remain the same but the bandwidth’s have been modified. This was to ensure the cache speed is adequate enough to run the AVX2 instructions execution in the core. The read & write ports in haswell L1 cache are 256-bit wide so that implies that we can perform two reads and one write per clock . Restrictions around banking have been eliminated. The L2 cache bus is wider so it can receive upto 64 bytes of data/clock cycle which is twice that of Ivy bridge. The improvements only deal with B.W. while the latency is the