Σχολή Ηλεκτρολόγων Μηχανικών και Μηχανικών Υπολογιστών
Πρόγραμμα Μεταπτυχιακών Σπουδών
ΠΑΡΟΥΣΙΑΣΗ ΜΕΤΑΠΤΥΧΙΑΚΗΣ ΕΡΓΑΣΙΑΣ
Αξιολόγηση της Πλατφόρμας Intel Harp με χρήση ενός Επιταχυντή ARM με Πολλαπλούς Πυρήνες
Evaluating the Intel Harp (Tightly-coupled CPU-FPGA) Platform with an ARM Many-core Accelerator
Πέμπτη 5 Σεπτεμβρίου 2019, 5 μ.μ.
Αίθουσα 145.Π42, Κτίριο Επιστημών, Πολυτεχνειούπολη
Καθηγητής Απόστολος Δόλλας (επιβλέπων)
Αναπληρωτής Καθηγητής Ευτύχιος Κουτρούλης
Καθηγητής Διονύσιος Πνευματικάτος (Σχολή ΗΜΜΥ, ΕΜΠ)
Nowadays industry vendors in order to achieve high bandwidth speeds, accelerate application and limit the power consumption of new architectures are building and developing heterogeneous systems. One of the most promising types among the various heterogeneous acceleration platforms is the CPU-FPGA (Field-programmable gate arrays) systems. The reason is that FPGAs provide reconfigurability to accelerate different applications, low power and high energy efficiency. In these platforms the CPU and the FPGA are tightly coupled with each other either on the same motherboard either on the same SoC (System on Chip). Such CPU-FPGA platforms are the Alpha Data board, the Amazon F1, the IBM CAPI, the Microsoft Catapult, the Convey HC-1 and the Intel Xeon+FPGA Platform. The use of these FPGA-base systems for real life application has already started. For example, Microsoft Catapult is used into conventional computer clusters to accelerate large-scale production workloads, such as search engines and neural networks. Amazon also has servers equipped with FPGAs (F1 instance). In addition, Intel developed a platform with a scalable Xeon and an integrated Arria 10 FPGA and has predicted that approximately 30% of servers could have FPGAs in 2020. All the prior mentioned platforms have some differences on the way that the CPU and the FPGA communicate and how the memory of the system is accessed.
Motivated by the uprising field of CPU and FPGA and all the advantages that these kinds of platforms can offer; this thesis aims to make an approach to evaluate the Intel platform with scalable Xeon and integrated Arria 10 FPGA. Xeon connects with the FPGA with three physical channels, one QPI (Quick Path Interconnect) coherent channel and two PCIe non- coherent channels and they share a common memory that is located on the CPU side. The total bandwidth of the three channel is approximately 19 Gb/s for reading and writing respectively. The evaluation was conducted with an ARM many-core accelerator. The ARM core has a 3- stage pipeline, it uses a 32-bit architecture and is implements the ARMv4 instruction set. Also, it implements some basic floating-point instructions. The RTL for the ARM core was written in Bluespec System Verilog (BSV). The hardware architecture has 16 ARM cores. Each core has a direct-map cache with a variable size. Instruction and data memories of every core can be initialized from software in order to the processors can execute the programs that are defined by the developer. The code and the data for the internal memories of each core are read from binary files. In each core is assigned buffers with a certain amount of memory space to read and write data from/to it. The hardware can have access to them with the use of physical addresses. For the purpose of measuring the bandwidth of the design STREAM benchmark was used. Plus, a matrix multiplication test was made as a way to check how the architecture handles real life applications.