How to Write Your Own Blazingly Fast Library of Special Functions for Intel Xeon Phi Coprocessors

by Vadim Karpusenko 3. May 2013 17:56

Complete paper: PDF logo Colfax_Static_Libraries_Xeon_Phi.pdf (425.6 kb)

Statically-linked libraries are used in business and academia for security, encapsulation, and convenience reasons. Static libraries with functions offloadable to Intel Xeon Phi coprocessors must contain executable code for both the host and the coprocessor architecture. Furthermore, for library functions used in data-parallel contexts, vectorized versions of the functions must be produced at the compilation stage.

This white paper shows how to design and build statically-linked libraries with functions offloadable to Intel Xeon Phi coprocessors. In addition, it illustrates how special functions with scalar syntax (e.g., y=f(x)) can be implemented in such a way that user applications can use them in thread- and data-parallel contexts. The second part of the paper demonstrates some optimization methods that improve the performance of functions with scalar syntax on the multi-core and the many-core platforms: precision control, strength reduction, and algorithmic optimizations.

Complete paper: PDF logo Colfax_Static_Libraries_Xeon_Phi.pdf (425.6 kb)

Tags: , , , , , ,

Auto-Vectorization with the Intel Compilers: is Your Code Ready for Sandy Bridge and Knights Corner?

by Andrey Vladimirov 12. March 2012 13:01

Complete paper:  Colfax_Sandy_Bridge_AVX.pdf (632.23 kb)

One of the features of Intel’s Sandy Bridge-E processor released this month is the support for the Advanced Vector Extensions (AVX) instruction set. Codes suitable for efficient auto-vectorization by the compiler will be able to take advantage of AVX without any code modification, with only re-compilation.

This paper explains the guidelines for code design suitable for auto-vectorization by the compiler (elimination of vector dependence, implementation of unit-stride data access and proper address alignment) and walks the reader through a practical example of code development with auto-vectorization. The resulting code is compiled and executed on two computer systems: a Westmere CPU-based system with SSE 4.2 support, and a Sandy Bridge-based system with AVX support. The benefit of vectorization is more significant in the AVX version, if the code is designed efficiently. An ‘elegant’, but inefficient solution is also provided and discussed.

In addition, the paper provides a comparative benchmark of the Sandy Bridge and Westmere systems, based on the discussed algorithm. Implications of auto-vectorization methods for Intel’s future Many Integrated Core technology based on the Knights Corner chip are discussed at the end.

Complete paper:  Colfax_Sandy_Bridge_AVX.pdf (632.23 kb)

Tags: , , , , , ,

About Colfax Research

Colfax International provides an arsenal of novel computational tools, which need to be leveraged in order to harness their full power. We are collaborating with researchers in science and industry, including our customers, to produce case studies, white papers, and develop a wide knowledge base of the applications of current and future computational technologies.

This blog will contain a variety of information, from hardware benchmarks and HPC news highlights, to discussions of programming issues and reports on research projects carried out in our collaborations. In addition to our in-house research, we will present contributions from authors in the academia, industry and finance, as well as software developers. Our hope is that this information will be useful to a wide audience interested in innovative computing technologies and their applications.

Author Profiles

Andrey Vladimirov, PhD, is a physicist with a longstanding interest in high performance computing. His research topics include computer simulations of cosmic ray production and propagation and collisionless plasma modeling. Andrey is a postdoctoral scholar at Stanford University.

All posts by this author...

Author Profiles

Vadim Karpusenko, PhD, is a Research Associate at Colfax International. His research interests are in the area of physical modeling with HPC clusters, highly parallel architectures, and code optimization. Vadim holds a PhD in Physics from North Carolina State University for his computational research of the free energy and stability of helical secondary structures of proteins.

All posts by this author...