Invited Presentations

Programme Overview

  • Opening Remarks

  • 14:00-14:30: Dataflow Acceleration of Deep Learning: Inference and Training

    by Prof. Wayne Luk - Imperial College


    This talk presents dataflow designs for deep learning applications, covering both inference and training. Optimisations such as parameter tuning for pipelined depth and multi-pumping are described. Analytical models for resource usage and for performance are developed. Implementations targeting Maxeler Data Flow Engines are reported.


    Prof. Wayne Luk is Professor of Computer Engineering at Imperial College London. His research interests include reconfigurable computing, field programmable technology, and design automation.

  • 14:30-15:00: Dataflow Programming for Deep Learning

    by Dr. Tobias Becker - Maxeler


    Dr. Tobias Becker is the Head of MaxAcademy at Maxeler Technologies where he coordinates various research activities and Maxeler’s university program. Before joining Maxeler he been has held positions as a researcher in the Department of Computing at Imperial College London, and at Xilinx, Inc. He received a Ph.D. degree in Computing from Imperial College London and a Dipl. Ing. degree in Electrical Engineering from the Technical University of Karlsruhe (now KIT). His research work covers topics in reconfigurable computing, custom accelerators, self-adaptive systems, low-power optimisations, and financial applications.

  • 15:00-15:30: Quantized Neural Networks on All Programmable Devices (pdf)

    by Dr. Thomas Preusser - Xilinx


    Neural Networks have a wide range of applications but pose a tremendous compute challenge. Research has shown that the quantization of network inference parameters and processed data is possible. This helps tame the compute challenge and has enabled flexible programmable logic to deliver significant compute power to the inference of various topologies of neural networks. This talk will briefly walk through the challenges of neural network employment and outline how programmable logic can be employed in a visual object classification using a TinyYOLO network derivative. Appreciating the huge performance gain for the lion's share of the required compute, the rest of the talk will focus on securing it as much as possible on the level of the integrated application. Neutralizing one bottleneck after the other by exploiting more and more of the heterogeneous compute base of Xilinx All Programmable Devices, it is demonstrated how benefits are combined and a convincing user experience is achieved in an end-to-end application.


    Dr. Thomas is an EU-supported Marie-Curie-Fellow investigating designated compute optimizations for quantized neural networks in Michaela Blott's group at Xilinx Research, Ireland. After studying at TU Dresden and UT Austin, he has worked as PostDoc researcher and instructor at his alma mater in Dresden. He has published more than 50 peer-reviewed conference papers and articles on computer arithmetic, digital design and simulation. Thomas is the Michael Servit Award winner of FPL 2010 and won the two latest records of calculating the solution counts of the N-Queens Puzzle using algebra and a distributed FPGA computation.

  • 15:30-16:00: Coffee break / Posters

  • 16:00-16:30: Using FPGAs to Accelerate Neural Network Inference (pdf)

    by Prof. Magnus Jahre - NTNU


    Deep Neural Networks (DNNs) have achieved impressive accuracy on difficult classification tasks. However, DNNs typically have large compute and memory requirements. For this reason, both industrials and academics are committing significant resources to develop acceleration strategies for DNNs. FPGAs are an interesting platform for deploying DNN inference accelerators. To maximize impact, we believe that the FPGA research community should focus on acceleration strategies that leverage the strengths that FPGA platforms have relative to CPUs, GPUs and ASICs. We argue that the key FPGA differentiator for DNN inference is the ability to customize the architecture to the structure of the DNN and provide examples from recent research to exemplify the benefits of exploiting customization.


    Prof. Magnus Jahre got his Master of Technology degree at NTNU in 2007 and his PhD from the same university in 2010. Since 2010, he has been an Associate Professor at the Department of Computer Science, NTNU. His current research interests are heterogeneous computer systems, energy efficiency, memory systems, computer architecture simulation, compilers and system software, and he has (co-)authored over 30 international peer-reviewed research articles. Jahre is a work package leader in the H2020 LEIT ICT project TULIPP and a PI in the H2020 FET HPC project READEX. In addition, he is involved in several industry-driven research collaborations with companies such as Xilinx, ARM, Nordic Semiconductor and Thales. He co-founded NTNU’s Energy Efficient Computing Systems (EECS) strategic research initiative and coordinated EECS from 2013 to 2016. Jahre also served as the head of NTNU’s Computing Research Group from 2011 to 2016. He is currently supervising 2 PhD students, co-supervisor for 5 PhD students and mentoring 2 post-doctoral researchers. He has contributed to graduating 2 PhD students and 3 post-doctoral researchers, and he is an affiliate member of the HiPEAC European Network of Excellence.

  • 16:30-17:00: Extending TensorFlow to Heterogeneous Processors Using SYCL (pdf)

    by Dr. Ralph Potter - Codeplay


    TensorFlow has become the most popular machine learning framework, gaining huge adoption since being published as an open-source project. Developers are using Google's TensorFlow to solve some of the world's most difficult problems, and whilst it was originally designed for use on high-performance computers with power-hungry processors, there is an increasing need to run machine learning applications on a diverse range of heterogeneous processors, including FPGAs.

    The Khronos Group's OpenCL open standard provides a programming model for a wide range of heterogeneous processors, including FPGAs from Altera and Xilinx. SYCL, another Khronos Group open standard provides a single-source C++ programming model for OpenCL devices. In this talk, we will describe our work to bring support for OpenCL devices to TensorFlow, and how the SYCL programming model can bridge the gap between hardware accelerators and the high-level abstractions used by TensorFlow. This provides a potential route towards FPGA support within TensorFlow.


    Dr. Ralph Potter is a senior research engineer at Codeplay Software, where his research focuses on programming models and optimization techniques for heterogeneous systems. He currently leads a team exploring approaches to accelerating machine learning algorithms.

    Codeplay is internationally recognized for expertise in Heterogeneous Systems, and has many years of experience in the development of Compilers, Runtimes, Debuggers, Test Systems, and other specialized tools. Codeplay has delivered standards-compliant systems for many of the world's largest semiconductor companies, focusing specifically on high-performance heterogeneous processor solutions including CPUs, GPUs, DSPs, FPGAs, and specialized vision and AI processors.

    Codeplay strongly participates within the Khronos Group™ to define new open standards such as OpenCL™, SPIR™, SYCL™, and Vulkan®. Codeplay also leads the development of the HSA Foundation’s software API standards, while maintaining leadership in the ISO C and C++ standards. Codeplay has earned a reputation as one of the leaders in compute systems

  • 17:00-17:30: Panel/Round Table Discussion