Squeezing More Instructions per Cycle out of the Intel Sandy Bridge CPU Pipeline

by Andrey Vladimirov 31. July 2012 20:50

Complete paper:  Colfax_CPI.pdf (253.26 kb)

Parallelism in modern CPU architectures is supported at hardware level by multiple cores, vector registers, and pipelines. While the utilization of the former two is a shared responsibility of the programmer and the compiler, pipelining is handled completely by the processor. It is, however, useful for the developer to know what types of workloads optimize pipeline utilization. This paper shows one example where a specific workload improves the number of instructions executed per clock cycle, boosting arithmetic performance. This workload is comprised of two independent data processing tasks, one performing the AVX addition instruction and the other — the AVX multiplication instruction. Even though these tasks are executed sequentially on one core, alternating additions and multiplications in the code allows the CPU to complete the task 40% faster than when a sequence of additions is followed by a sequence of multiplications. Such workloads are common in linear algebraic applications. Examples in the paper illustrate how improved performance can be achieved in portable C code using the Intel C/C++ compiler. Performance benchmarking with the Intel Vtune Parallel Amplifier is illustrated.

c code c code c code

Complete paper:  Colfax_CPI.pdf (253.26 kb)

Tags: , , , , ,

About Colfax Research

Colfax International provides an arsenal of novel computational tools, which need to be leveraged in order to harness their full power. We are collaborating with researchers in science and industry, including our customers, to produce case studies, white papers, and develop a wide knowledge base of the applications of current and future computational technologies.

This blog will contain a variety of information, from hardware benchmarks and HPC news highlights, to discussions of programming issues and reports on research projects carried out in our collaborations. In addition to our in-house research, we will present contributions from authors in the academia, industry and finance, as well as software developers. Our hope is that this information will be useful to a wide audience interested in innovative computing technologies and their applications.

Author Profiles

Andrey Vladimirov, PhD, is a physicist with a longstanding interest in high performance computing. His research topics include computer simulations of cosmic ray production and propagation and collisionless plasma modeling. Andrey is a postdoctoral scholar at Stanford University.

All posts by this author...

Author Profiles

Vadim Karpusenko, PhD, is a Research Associate at Colfax International. His research interests are in the area of physical modeling with HPC clusters, highly parallel architectures, and code optimization. Vadim holds a PhD in Physics from North Carolina State University for his computational research of the free energy and stability of helical secondary structures of proteins.

All posts by this author...