Cloud native EDA tools & pre-optimized hardware platforms
These 2 panels explore the challenges posed by Silent Data Corruption (SDC) and the strategic interventions within the realm of Reliability, Availability, and Serviceability (RAS) for contemporary systems. Within these panel discussions we will define SDC, explore potential causes, including areas where SDC and RAS interact. Both RAS and SDC encounter challenges with technology scaling and integration, compute power, testing of advanced nodes, and detection and correction of permanent, degrading, intermittent, or transient errors. Listen to the insights with industry experts from Synopsys, Arm, Google and Microsoft covering multiple perspectives including end users, hyperscalers, OEMs, semiconductor suppliers, and EDA companies.
Speakers (left to right): Rama Govindaraju, Jyotika Athavale, Robert S. Chappell, Amr Haggag
Part I of this panel defines and explores potential causes of SDC.
Part II of this panel discusses strategies to mitigate SDC.
Director, Engineering Architecture, Synopsys
Jyotika is a Director, Engineering Architecture at Synopsys, leading quality, reliability and safety research, pathfinding and architectures for data centers and automotive applications. Jyotika also serves as the 2024 President of the global IEEE Computer Society, overseeing overall IEEE-CS programs and operations. For her leadership in international safety standardization, Jyotika was awarded the 2023 IEEE SA Standards Medallion. And for her leadership in service, she was awarded the IEEE Computer Society Golden Core Award in 2022.
Head of Quality - Silicon 91³Ô¹ÏÍø, Arm
Dr. Amr Haggag is Arm silicon solutions head of quality. Prior he led the quality team for Google custom silicon (Tensor), technology/design reliability team for Apple silicon (A-, M- and S- series) and was quality technical director at Motorola/Freescale. He has more than 20 years of semiconductor industry quality and reliability leadership experience and has served on multiple IEEE committees covering reliability and RAS.
Principal Engineer, Google
Rama Govindaraju is a Principal Engineer at Google leading the effort to ensure reliability of large-scale Machine Learning Supercomputers. In prior roles, Rama was Director of Engineering at Google where he led the Systems Infrastructure Architecture team and a Distinguished Engineer at IBM responsible for leading the Software Architecture at IBM's Supercomputing Lab where he led the development of 5 generations of Supercomputers. Rama received his MS and Phd in Computer Science from Rensselaer Polytechnic Institute in New York and BE in Computer Science from BIT Mesra, Ranchi, India.
Partner Hardware Architecture, Microsoft
Robert S. Chappell is a Partner at Microsoft in Redmond, WA. Rob has a passion for "at-scale" computing and is responsible for improving the reliability and performance of the millions of server nodes underlying Azure's core cloud business. Prior to joining Microsoft in 2019, Rob spent over 20 years architecting high-volume CPUs. Rob earned a Ph.D. in Computer Architecture from the University of Michigan.