It has become extraordinarily difficult to write software that performs close to optimally on complex modern microarchitectures. Particularly plagued are domains that are data intensive and require complex mathematical computations such as information retrieval, scientific simulations, graphics, communication, control, and multimedia processing. In these domains, performance-critical components are usually written in C (with possible extensions) and often even in assembly, carefully “tuned” to the platform’s architecture and microarchitecture. Specifically, the tuning includes optimization for the memory hierarchy and for different forms of parallelism. The result is usually long, rather unreadable code that needs to be re-written or re-tuned with every platform upgrade. On the other hand, the performance penalty for relying on straightforward, non-tuned, “more elegant” implementations is typically a factor of 10, 100, or even more. The reasons for this large gap are some (likely) inherent limitations of compilers including the lack of domain knowledge, and the lack of an efficient mechanism to explore the usually large set of transformation choices. The recent end of CPU frequency scaling, and thus the end of free software speed-up, and the advent of mainstream parallelism with its increasing diversity of platforms further aggravate the problem.
Markus Püschel is a Professor and former Department Head of Computer Science at ETH Zurich, Switzerland. Before, he was a Professor of Electrical and Computer Engineering at Carnegie Mellon University, where he still has an adjunct status. He received his Diploma (M.Sc.) in Mathematics and his Doctorate (Ph.D.) in Computer Science, in 1995 and 1998, respectively, both from the University of Karlsruhe, Germany.