Drastically Reducing Out-of-Memory Errors in Apache Spark at Pinterest
Felix Loesing | Software EngineerIn 2025, we set out to drastically reduce out-of-memory errors (OOMs) and cut resource usage in our Spark applications by automatically identifying tasks with higher memory demands and retrying them on larger executors with a feature we call Auto Memory Retries.Spark...
Dmitry Kislyuk | Director, Machine Learning; Ryan Galgon | Director, Product Management; Chuck Rosenberg | Vice President, Engineering; Matt Madrigal | Chief Technology OfficerForeword from Bill Ready, CEOThe AI landscape is undergoing a fundamental shift, and it’s not the one you think. The competi...