International Workshop on FPGAs for Software Programmers (FSP 2019)

News

♦ Program:

The program has been updated.

♦ Program:

The preliminary program is available.

♦ Registration information:

https://fpl2019.bsc.es/registration

♦ Submission deadline extended to July 14, 2019

Go to the FSP 2019 EasyChair page to submit.
https://www.easychair.org/conferences/?conf=fsp2019

Overview and Scope

The aim of this workshop is to make FPGA and reconfigurable technology accessible to software programmers. Despite their frequently proven power and performance benefits, designing for FPGAs is mostly an engineering discipline carried out by highly trained specialists. With recent progress in high-level synthesis, a first important step towards bringing FPGA technology to potentially millions of software developers was taken.

The FSP Workshop aims at bringing researchers and experts from both academia and industry together to discuss and exchange the latest research advances and future trends. This includes high-level compilation and languages, design automation tools that raise the abstraction level when designing for (heterogeneous) FPGAs and reconfigurable systems and standardized target platforms. This will in particular put focus on the requirements of software developers and application engineers. In addition, a distinctive feature of the workshop will be its cross section through all design levels, ranging from programming down to custom hardware. Thus, the workshop is targeting all those who are interested in understanding the big picture and the potential of domain-specific computing and software-driven FPGA development. In addition, the FSP Workshop shall facilitate collaboration of the different domains.

Topics of the FSP Workshop include, but are not limited to:

High-level synthesis (HLS) and domain-specific languages (DSLs) for FPGAs and heterogeneous systems
Mapping approaches and tools for heterogeneous FPGAs
Support of hard IP blocks such as embedded processors and memory interfaces
Development environments for software engineers (automated tool flows, design frameworks and
tools, tool interaction)
FPGA virtualization (design for portability, resource sharing, hardware abstraction)
Design automation technologies for multi-FPGA and heterogeneous systems
Methods for leveraging (partial) dynamic reconfiguration to increase performance, flexibility,
reliability, or programmability
Operating system services for FPGA resource management, reliability, security
Target hardware design platforms (infrastructure, drivers, portable systems)
Overlays (CGRAs, vector processors, ASIP- and GPU-like intermediate fabrics)
Applications (e.g., embedded computing, signal processing, bio informatics, big data,
database acceleration) using C/C++/SystemC-based HLS, OpenCL, OpenSPL, etc.
Directions for collaborations (research proposals, networking, Horizon 2020)

Previous Editions

Call for Papers

Download as PDF document.

Paper submission

Perspective authors are invited to submit original contributions (up to eight pages) or extended abstracts describing work-in-progress or position papers (not exceeding two pages). All papers should be formatted as follows: A4 or US Letter size, PDF file format (must not have Adobe Document Protection or Document Security enabled and must have all fonts embedded), double column, single spaced, Times or equivalent font of minimum 10pt. We recommend that you use the proceedings templates for LaTeX and Word formats provided by VDE-Verlag (https://www.vde-verlag.de/proceedings-en/type-instructions.html).
All submissions have to be sent via the conference management system EasyChair. Please set up your own personal account if you do not already own an EasyChair account.

Publications

The proceedings of this workshop containing all accepted papers are planned to be published by VDE-Verlag (Germany) and are planned to be indexed by IEEE Xplore. Every accepted paper must have at least one author registered to the workshop by the time the camera-ready paper is due.

Important dates

Submission deadline:July 7, 2019 extended to July 14, 2019
Notification of acceptance:July 31, 2019 extended to August 7, 2019
Camera-ready final version:August 16, 2019 extended to August 23, 2019

Program

Download program as PDF document.

09:00 - 09:05

Workshop Opening

09:05 - 10:05

Keynote 1: Addressing High-level Synthesis Challenges for Heterogenous Computing at the Edge
Juan Eusse, Silexica, Germany

10:05 - 11:50

Session 1: HLS Acceleration and Optimization

10:05 - 10:30

Accelerating Human Activity Recognition Systems on FPGAs through a DSL approach
Daniel Fernandes and João Cardoso

10:30 - 11:00

Coffee break

11:00 - 11:25

Accelerating Design Convergence of Automata Processing Designs with a Tiled Hierarchy
Tommy Tracy II, Jack Wadden, Ted Xie, Kevin Skadron and Mircea Stan

11:25 - 11:50

Impact of Off-Chip Memories on HLS-Generated Circuits
Abhi D.R., Ron Sass and Andrew Schmidt

11:50 - 13:05

Session 2: Heterogeneous Systems and Runtime Support

11:50 - 12:15

libGalapagos: A Software Environment for Prototyping and Creating Heterogeneous FPGA and CPU Applications
Naif Tarafdar and Paul Chow

12:15 - 12:40

Run-time Performance Monitoring of Heterogenous Hw/Sw Platforms Using PAPI
Tiziana Fanni, Daniel Madroñal, Claudio Rubattu, Carlo Sau, Francesca Palumbo, Eduardo Juárez, Maxime Pelcat, César Sanz and Luigi Raffo

12:40 - 13:05

ZUCL 2.0: Virtualised Memory and Communication for ZYNQ UltraScale+ FPGAs
Khoa Pham, Kyriakos Paraskevas, Anuj Vaishnav, Andrew Attwood, Malte Vesper and Dirk Koch

13:05 - 14:00

Lunch Break

14:00 - 15:00

Keynote 2: Architecture Virtualization as Prerequisite for Large-Scale FPGA Adoption In Software Communities
Christophe Bobda, University of Florida, USA

15:00 - 15:30

Invited Talk: fpgaConvNet and f-CNNx: Towards addressing the challenges in ML application deployment
Christos-Savvas Bouganis, Imperial College London, UK

15:30 - 16:00

Coffee Break

16:00 - 16:45

Tutorial: OpenCL design flows for Intel and Xilinx FPGAs - using common design patterns and dealing with vendor-specific differences
Tobias Kenter, Paderborn University, Germany

16:45 - 17:15

Invited Talk: Care of magical creatures - How to tame, train and feed your Alveo or Stratix FPGA card
Luciano Lavagno, Politecnico di Torino, Italy

17:15 - 17:30

Workshop Closing

Invited Speakers

Juan Eusse, Silexica, Germany
"Addressing High-level Synthesis Challenges for Heterogenous Computing at the Edge"

Abstract:
Juan Eusse In this presentation we examine how SLX FPGA is used to take a MATLAB Embedded Coder™ generated C/C++ algorithm, in this case a Kalman filter, and optimize the C/C++ code for HLS. In this example, SLX FPGA provides more than 62x improvement in performance after auto-insertion of HLS pragmas when compared to the solution created by the HLS compiler for the original code, which had no pragmas inserted.

Christophe Bobda, Gainesville, FL, United States
"Making FPGAs accessible to developers and customers"

Abstract:
Christophe Bobda The continuous decreasing size of transistors and the capability of packing more transistors on smaller area has led the transition from high-clocked single processors to multicores. However, pure multicore solutions without customized computing components will not be able to provide the required performance in many computation fields. Studies are predicting that increase in power as result of chip density will drastically reduce the usage of manycore processors to a maximum of 75%.
   Increasing the number of cores on a chip will not be enough to meet the performance requirement in application fields like image processing, oil and gas exploration, and programmatic financial trading, which requires complex 3D convolutions of several large data arrays and tight requirements on memory and IO. Heterogeneous architectures made upon general purpose processors and specialized computing components, and intelligent methods to dynamically adapt chip resource to run-time computational and power consumption requirements can improve the resource usage of future heterogeneous multicore. Reconfigurable logic like FPGAs can be used for this purpose, as part of a multiprocessor on the same die or as separate co-processor. For such a platform however to be successful, viable programing environments must be provided to the huge community of software developers, so they can port existing programs or write new ones for the target platform, without the need to change their habits and learn special languages.
   Attempts to reduce the programming burdens of reconfigurable systems have been mostly limited to the development of C-like languages and compiler, capable of compiling a subset of the C-language, extended with special constructs to capture low-level hardware behavior. While those languages reduce the efforts of building the target hardware from a reference C-implementation as opposed to a recoding in a hardware description language, their main focus is still the generation of a piece of hardware. Therefore designers must be well aware of hardware design techniques and signal related issues, bit manipulation, timing constraint and resource limitation, to produce efficient implementations. Hence the reluctance of the software community to adopt available design tools and environments. Low-level languages such as CUDA, annotations and language extensions like Microsoft AMP, libraries such as pthreads and new languages like OpenCL require a good understanding of the target architecture to specify capture efficient designs.
   In this talk, we discuss the need of virtualization as prerequisite for large-scale adoption of FPGA in software communities and the progress achieve so far in virtualization. We discuss virtualization in context of FPGA-accelerated workstation, clouds and datacenters.

Speaker's bio:
Professor Bobda received the Licence in mathematics from the University of Yaounde, Cameroon, in 1992, the diploma of computer science and the Ph.D. degree (with honors) in computer science from the University of Paderborn in Germany in 1999 and 2003 (In the chair of Prof. Franz J. Rammig) respectively. In June 2003 he joined the department of computer science at the University of Erlangen-Nuremberg in Germany as Post doc, under the direction of Prof Jürgen Teich. Dr. Bobda received the best dissertation award 2003 from the University of Paderborn for his work on synthesis of reconfigurable systems using temporal partitioning and temporal placement. In 2005 Dr. Bobda was appointed assistant professor at the University of Kaiserslautern. There he set the chair for Self-Organizing Embedded Systems that he led until October 2007. From 2007 to 2010 Dr. Bobda was Professor at the University of Potsdam and leader of The working Group Computer Engineering.
   Profesor Bobda is Senior Member of the ACM. He is also in the program committee of several conferences (FPL, FPT, RAW, RSP, ERSA, RECOSOC, DRS), the DATE executive committee as proceedings chair (2004, 2005, 2006, 2007, 2008, 2009, 2010). He served as reviewer of several journals (IEEE TC, IEEE TVLSI, Elsevier Journal of Microprocessor and Microsystems, Integration the VLSI Journal) and conferences (DAC, DATE, FPL, FPT, SBCCI, RAW, RSP, ERSA), as guest editor of the Elsevier Journal of Microprocessor and Microsystems and member of the editorial board of the Hindawi International Journal of Reconfigurable Computing. Dr. Bobda is the author of one of the first most comprehensive books in the rapidly growing field of Reconfigurable Computing.

Christos-Savvas Bouganis, Imperial College London, UK
"fpgaConvNet and f-CNNx: Towards addressing the challenges in ML application deployment"

Abstract:
The wide availability of data combined with the large compute power that modern systems can provide has led to the ability of fitting highly parameterised non-linear models with good generalisation properties and to the dawn of the machine learning era. Machine learning models, and more specifically Deep Neural Networks, have attracted the attention of researchers and practitioners from various fields, as they have shown that can result in systems with performance that matches or even exceeds that of humans.
   In the embedded space, where energy and power consumption of the compute platform is crucial for the successful deployment of Deep Neural Network, a number of devices are usually considered by the users ranging from ASICs and FPGAs, to Neural Processors, DSPs, and mobile GPUs. From the above spectrum of devices, FPGAs offer the possibility to customise the architecture of the system to the ML workload that needs to be executed, resulting in significant gains in terms of absolute performance and power consumption approaching that of an ASIC implementation, maintaining at the same time the possibility to reconfigure itself in order to address a different workload, achieving the flexibility and longevity of the solution offered by a CPU-based system.
   As such, it is necessary to design architectures, that are mapped onto the FPGA devices, that are highly parameterised in order to be fine-tuned to the targeted workloads and, as such, lead to high performance systems. The combination of the above with the diversity of the workloads and the heterogeneity of modern FPGA devices, leads to a large parameter space that needs to be explored and necessitates the existence of tools that automate the design space exploration phase in order to identify the configuration of the system that best meets the user’s objectives (i.e. throughput, latency, power, resources).
   Towards this direction, the fpgaConvNet [1] tool aims to help designers that map Convolutional Neural Networks (CNNs) into FPGA systems in meeting their targets. fpgaConvNet takes as input a description of a CNN, the characteristics of the FPGA platform (i.e. FPGA resources, external memory capacity and off-chip memory bandwidth) and an objective function (i.e. requirements on throughput and latency of the system), and generates an IP tailored to the provided CNN. This is achieved by defining a highly parametrised architecture capable to be fine-tuned to the target ML workload, an analytical model of the performance of the system under a specific configuration, as well as a methodology to efficiently traverse the design space resulting in designs that meet the user’s requirements.
   f-CNNx [2] extends the capabilities of fpgaConvNet, and addresses the problem of executing multiple CNNs on the same FPGA device under different target performance metrics. Key features of the framework are the modelling of the off-chip memory subsystem, its exposure to the design space exploration stage, and the co-optimisation of the scheduling of the tasks for execution and the parameterisation of the architecture of the system.
   Both frameworks, have raised the level of abstraction in designing systems that execute a single or multiple CNNs, are able to produce systems that achieve high performance under various user specifications, and have enabled programmers to manage the complexity in designing such systems addressing some of the challenges in ML application deployment.

[1] S. I. Venieris and C.-S. Bouganis, “fpgaConvNet: Mapping Regular and Irregular Convolutional Neural Networks on FPGAs,” IEEE Transactions on Neural Networks and Learning Systems, 2018.

[2] S. I. Venieris and C. S. Bouganis, “f-CNNx: A Toolflow for Mapping Multiple Convolutional Neural Networks on FPGAs,” in Int. Conf. on Field Programmable Logic and Applications (FPL), 2018.

Tobian Kenter, University of Paderborn, Germany
"OpenCL design flows for Intel and Xilinx FPGAs - using common design patterns and dealing with vendor-specific differences"

Abstract:
An increasing fraction of new results in the reconfigurable computing domain are obtained with the help of high level synthesis tools. Among the more popular tools are the OpenCL based Xilinx SDAccel and Intel FPGA SDK for OpenCL. Since they are building upon the same programming model and source language, one would hope for portability between different OpenCL based FPGA designs. However, the vast majority of published research is only optimized for one vendor tool and FPGA family.
In this tutorial, we want to broaden that scope and provide practical guidance for both tool chains. We will present a condensed version of longer tutorial held at DATE 2019 and focus on the important examples and strategies while providing links to further material. The tutorial contains step by step optimization examples with performance models mostly based on analysis of generated reports. We will present design patterns that work well for both tools and thus can promote portability of OpenCL based FPGA designs, but also shed light on differences. Based on examples, we will illustrate the central difference in pipelining of nested loops, which has implications on local memory ports, replication and predictability of design space exploration.

Speaker's bio:
Tobias Kenter received his PhD from Paderborn University in 2016 on the topic of productivity for FPGAs through overlays, compilation approaches and tight coupling between FPGAs and CPUs. Since then he focused on the acceleration of scientific applications on FPGAs using OpenCL based development flows from Intel and Xilinx. As HPC consultant for FPGA acceleration at the Paderborn Center for Parallel Computing he strives to bring more applications to this exciting technology.

Luciano Lavagno, Politecnico di Torino, Italy
"Care of magical creatures - How to tame, train and feed your Alveo or Stratix FPGA card"

Abstract:
Luciano Lavagno Despite the final demise of Moore's law, a new "magic bullet" promises to deliver a few more generations of Moore-like progress. FPGAs have been around for decades, but now they have a chance to conquer new territory. Reprogrammability and High-Level Synthesis from C-like languages promise easy migration of applications from CPUs and GPUs, gaining in terms of both energy per computation, and performance. The availability of FPGAs in cloud datacenters (e.g. the AWS F1 instances and the Intel DevCloud) somehow offsets the still high cost.
The talk will discuss the flaws in the "same programming language means easy transition" argument, and provide suggestions about how to structure an application so that it can be efficiently mapped to one or more datacenter-class FPGAs. It will describe how the memory hierarchy and its management differ between CPUs/GPUs and FPGAs, and how the application architecture must be adapted to the FPGA board memory architecture much more carefully than on CPUs and GPUs. Then it will discuss how balancing the load between the various stages of the computation must be carefully managed by hand, until fast and efficient dynamic reconfiguration will come to the rescue.
In conclusion, FPGAs are magical creatures indeed, but it does take some training to master magic. Even at Hogwarts...

Speaker's bio:
Luciano Lavagno received his Ph.D. in EECS from U.C. Berkeley (California, USA) in 1992 and from Politecnico di Torino (Italy) in 1993. He co-authored two books on asynchronous circuit design, a book on hardware/software co-design of embedded systems, the CRC Handbook on Electronic Design Automation, and over 200 scientific papers. He has been granted 12 US patents. Between 1993 and 2000 he was the architect of the POLIS project, which developed a complete hardware/software co-design environment for control-dominated embedded systems. Between 2003 and 2014 he has been one of the creators and architects of the Cadence C-to-Silicon high-level synthesis system. Between 2015 and 2017 he has worked, with the Catpult high-level synthesis group of Mentor Graphics. Since 2018 he has been working on the Vivado HLS tool and SDAccel design environment at Xilinx. Since 2011 he is a full professor with Politecnico di Torino, Italy. His research interests include the synthesis of asynchronous low-power circuits, the concurrent design of mixed hardware and software embedded systems, the high-level synthesis of digital circuits, the design and optimization of hardware components and protocols for wireless sensor networks.

Organization

General Co-Chairs

Dirk Koch, University of Manchester, UK
Markus Weinhardt, Osnabrück University of Applied Sciences, Germany

Proceedings Chair

Christian Hochberger, Technische Universität Darmstadt, Germany

Program Committee

Hideharu Amano, Keio University, Japan
Lars Bauer, Karlsruhe Institute of Technology, Germany
João M. P. Cardoso, University of Porto, Portugal
Paul Chow, University of Toronto, Canada
Frank Hannig, Friedrich-Alexander University Erlangen-Nürnberg, Germany
Tobias Kenter, University of Paderborn, Germany
Andreas Koch, Technische Universität Darmstadt, Germany
Miriam Leeser, Northeastern University, USA
Mario D. Marino, Leeds Beckett University, UK
Dan Poznanovic, USA
Zain Ul-Abdin, Halmstad University, Sweden
Rüdiger Willenberg, Mannheim University of Applied Sciences, Germany
Daniel Ziener, University of Twente, Netherlands

Website and Proceedings

Alexander Schwarz, Technische Universität Darmstadt, Germany

Contact

fsp2019@easychair.org

This workshop is sponsored by