Scaling data analysis to warehouse-scale computers
Analyzing massive datasets quickly requires scaling foundational data processing algorithms to the unprecedented compute, network and I/O concurrency of a modern datacenter. However, common building blocks of every data analysis pipeline either have scalability bottlenecks or are unsatisfactory for I/O-intensive analytics due to their high-performance computing pedigree. This talk highlights research challenges that need to be overcome to scale data processing to warehouse-scale computers, with particular focus on how to harness RDMA-capable networks, non-uniform network topologies, massively parallel file systems and NVMe-based storage in a disaggregated datacenter.
About the Speaker
Spyros Blanas is an assistant professor in the Department of Computer Science and Engineering at The Ohio State University. His research interest is high-performance database systems, and his current goal is to build a database system for high-end computing facilities. He has received the IEEE TCDE Rising Star Award and a Google Research Faculty award. He received his Ph.D. at the University of Wisconsin–Madison and part of his Ph.D. dissertation was commercialized in Microsoft's flagship data management product, SQL Server, as the Hekaton in-memory transaction processing engine./