Architecting High Performance Ansys CAE Solutions with Intel Xeon 6 and Lenovo ThinkSystem Servers

Top

Author

Kevin Dean

Published

2 Jun 2026

Form Number

LP2442

PDF size

23 pages, 2.5 MB

Rate & Provide Feedback

Download PDF

Table of Contents

Importance of HPC in CAE
Synopsys: A Major Player in the CAE Landscape and HPC
Implications for System Architecture
Performance Results

Results Summary and Key Takeaways
Conclusion
Authors
Related product families
Trademarks

Abstract

Computer‑Aided Engineering (CAE) plays a critical role in accelerating product development across industries such as automotive, aerospace, energy, and manufacturing. As simulation models grow in size and complexity—driven by finer meshes, nonlinear material behavior, and multiphysics coupling—computational demands on underlying infrastructure continue to increase. Modern CAE platforms must deliver not only higher core counts, but also sustained performance, high memory bandwidth, and predictable scalability to reduce time‑to‑solution and improve engineering productivity.

This paper presents a comprehensive performance study of Ansys Fluent® fluid simulation software, Ansys Mechanical™ structural finite element analysis software, and Ansys LS-DYNA® nonlinear dynamics structural simulation software on Lenovo ThinkSystem platforms powered by Intel® Xeon® 5th and 6th Generation processors. Rather than relying on theoretical peak metrics, the analysis focuses on real application behavior, highlighting how solver characteristics interact with processor architecture, memory technology, and system‑level design.

The results demonstrate clear generation‑over‑generation performance improvements with Intel Xeon 6 processors, though the magnitude and source of gains vary significantly by solver. Fluent and LS‑DYNA applications benefit strongly from increased core density and memory bandwidth, particularly for large models, while Mechanical software performance is governed by memory behavior and sustained per‑core performance rather than core count alone. High‑bandwidth memory technologies such as MRDIMMs further improve performance for memory‑intensive workloads.

Based on these findings, the paper provides practical guidance for selecting processors, memory, and platform configurations optimized for Fluent, Mechanical, and LS‑DYNA workloads. By combining Intel Xeon 6 processors with Lenovo ThinkSystem air‑cooled and direct‑water‑cooled platforms and validated Ansys, now part of Synopsys, software stacks, organizations can deploy CAE infrastructure that scales efficiently and supports increasingly demanding engineering simulations.

Importance of HPC in CAE

Computer‑Aided Engineering (CAE) has become a foundational capability for organizations developing complex products across industries such as automotive, aerospace, energy, and manufacturing. Advances in simulation fidelity—driven by finer meshes, nonlinear material models, transient physics, and increasingly coupled multiphysics workflows—have enabled engineers to evaluate designs earlier and more comprehensively in the development cycle.

These advances, however, have significantly increased the computational demands placed on underlying infrastructure. Modern CAE workloads require high‑performance computing (HPC) platforms that can deliver sustained performance, high memory bandwidth, and predictable scalability rather than peak theoretical capability alone. As simulation sizes grow and solver concurrency increases, limitations in memory subsystem performance, power delivery, and thermal management increasingly determine achievable time‑to‑solution.

As a result, HPC is no longer reserved for only the largest simulations or centralized clusters; it has become an essential component of day‑to‑day CAE workflows. Selecting and architecting HPC platforms that align with solver behavior is critical to maximizing simulation throughput, improving engineering productivity, and ensuring that infrastructure investments scale effectively as CAE workloads continue to evolve.

Synopsys: A Major Player in the CAE Landscape and HPC

Through the acquisition of Ansys, Synopsys now complements its leadership in Electronic Design Automation (EDA) with one of the world’s most comprehensive CAE portfolios, offering industry-leading physics‑based simulation tools across computational fluid dynamics, structural and explicit dynamics, electromagnetics, optics, photonics, and multiphysics analysis. Ansys CAE applications from Synopsys span a broad range of numerical methods and solver behaviors, each placing different demands on compute, memory, and system architecture. While these solvers are often deployed within the same CAE environment, their performance characteristics vary substantially depending on the underlying physics, discretization approach, model resolution, and parallelization strategy. Understanding these differences is essential for interpreting benchmark results and for selecting processor, memory, and platform configurations that align with specific workload requirements.

Topics in this section:

Fluent – Computational Fluid Dynamics Software
LS-DYNA – Explicit Finite Element Analysis Software
Mechanical – Implicit Finite Element Analysis Software

Fluent – Computational Fluid Dynamics Software

Fluent is a general‑purpose computational fluid dynamics (CFD) software package used to model fluid flow, heat transfer, turbulence, and related physical phenomena across a wide range of industries. Fluent simulations commonly employ structured or unstructured meshes ranging from millions to hundreds of millions of cells, with solution methods that rely on iterative solvers, sparse linear algebra, and frequent data movement through the memory hierarchy.

From a system perspective, Fluent software exhibits a balanced performance profile influenced by CPU frequency, core count, cache capacity, and memory bandwidth. Smaller and moderately sized models tend to benefit from strong per‑core performance and sustained CPU frequency, while larger models increasingly become constrained by aggregate memory bandwidth as working sets exceed cache capacity. As a result, increases in core count alone do not guarantee proportional performance gains unless sufficient memory throughput is available to keep cores fully utilized.

Fluent software generally scales efficiently within a single node and across multiple nodes when memory bandwidth and interconnect performance are well balanced. At higher core counts, the ratio of MPI ranks to available memory bandwidth per core becomes a critical tuning parameter. Platforms that provide higher aggregate memory bandwidth—through additional memory channels or higher‑speed memory technologies—tend to deliver more consistent and predictable performance improvements for large‑scale Fluent workloads.

LS-DYNA – Explicit Finite Element Analysis Software

LS‑DYNA is an explicit finite element analysis (FEA) software package widely used for highly dynamic simulations such as crashworthiness, impact, drop tests, and forming processes. Besides an implicit solver, LS‑DYNA can also advance the solution using explicit time integration with very small time steps, resulting in a large number of repeated computations over the course of a simulation.

From a computational perspective, LS‑DYNA explicit workloads are dominated by sustained arithmetic throughput and parallel efficiency rather than complex global matrix operations. Because each time step performs relatively simple calculations that are executed repeatedly, LS‑DYNA software benefits strongly from increased core counts and the ability to maintain stable all‑core frequency over long runtimes. As a result, explicit dynamics workloads often exhibit predictable scaling behavior as computational resources increase.

Memory bandwidth remains an important consideration for large LS‑DYNA models with high element counts, but explicit solvers are generally less sensitive to memory latency than implicit FEA workloads. Platform characteristics such as thermal management and power delivery therefore play a critical role in determining achievable performance, as simulations often run at high utilization for extended periods with minimal idle time.

Mechanical – Implicit Finite Element Analysis Software

Mechanical is an implicit FEA software package used for structural, thermal, and coupled multiphysics simulations, including static and transient structural response, contact, and nonlinear material behavior. Many Mechanical workloads rely heavily on sparse linear algebra, where performance is shaped by how efficiently the system can assemble and solve large sparse systems of equations.

From a system perspective, Mechanical performance is often dominated by memory behavior—particularly memory bandwidth, latency, and cache efficiency—rather than raw floating‑point throughput. As model sizes grow and working sets exceed cache capacity, solver phases such as matrix assembly and solution can become increasingly constrained by data movement and synchronization overhead, leading to diminishing returns from additional cores unless the memory subsystem scales accordingly.

Mechanical performance also depends strongly on the linear solver strategy. Direct solvers (factorization‑based) tend to be more memory‑intensive and can become bandwidth‑limited, while iterative solvers (preconditioned Krylov methods) reduce memory footprint but introduce sensitivity to convergence behavior and global synchronization. Sustained CPU frequency under continuous load remains important for both approaches, particularly for nonlinear analyses that require repeated solution phases.

Implications for System Architecture

The distinct computational characteristics of Fluent, Mechanical, and LS‑DYNA underscore the need for a balanced, solver‑aware system architecture. No single hardware configuration is optimal across all CAE workloads: CFD and explicit dynamics benefit from increased core density and memory bandwidth, while implicit FEA places greater emphasis on memory behavior, cache efficiency, and sustained per‑core performance.

These differences highlight why processor architecture, memory technology, and platform‑level design must be considered together rather than in isolation.

As CAE workloads scale, limitations related to memory bandwidth, power delivery, and thermal management increasingly determine achievable performance and scalability. Effective CAE infrastructure therefore requires platforms that can expose processor capabilities consistently under real‑world solver behavior, not just peak theoretical specifications.

Topics in this section:

Intel Xeon Processors: The CAE Compute Foundation
Memory Technology as a First Order Performance Factor
Lenovo ThinkSystem Servers for Ansys CAE Solutions

Intel Xeon Processors: The CAE Compute Foundation

Intel® Xeon® processors have long served as the computational workhorse for CAE workloads, providing a balanced combination of core density, sustained performance, memory scalability, and a mature HPC software ecosystem. This balance has made Xeon processors a dependable foundation for engineering simulation across a wide range of use cases, from departmental CAE environments to large‑scale HPC deployments.

Intel Xeon 5th Generation processors represent the established baseline for many CAE infrastructures, delivering balanced performance across CFD, implicit FEA, and explicit dynamics workloads.

Intel Xeon 6 processors extend this foundation with architectural enhancements that increase core density, improve sustained all‑core performance, and expand memory bandwidth capabilities, enabling higher performance as solver concurrency and model complexity increase.

The Xeon 6 family spans SP‑class processors optimized for balanced deployments, and AP‑class processors designed for maximum throughput in the most demanding CAE workloads. Rather than focusing on product specifications in isolation, CAE performance is best understood through the architectural attributes that most directly influence solver behavior: core count and sustained frequency, memory bandwidth and latency, and cache capacity.

These characteristics determine how effectively solvers utilize available compute resources and define the requirements that system platforms must satisfy to deliver consistent, production‑grade performance under real‑world CAE workloads.

Memory Technology as a First Order Performance Factor

As CAE solvers continue to scale in model size and concurrency, memory subsystem performance has become a first‑order design consideration rather than a secondary optimization.

For many modern workloads, particularly CFD and large finite element analyses, solver execution is increasingly dominated by data movement rather than floating‑point computation. In these regimes, limitations in memory bandwidth and latency can constrain performance even when sufficient compute resources are available. Traditional DDR5 RDIMM configurations provide a balanced and widely deployed solution, but may become a bottleneck as core counts increase, and working sets exceed cache capacity.

High‑bandwidth memory technologies are therefore critical to sustaining performance as solver concurrency grows. Without sufficient memory throughput, additional cores cannot be effectively utilized, leading to diminishing returns from processor scaling alone. Multiplexed Rank DIMMs (MRDIMMs) address this challenge by increasing effective memory bandwidth per channel and reducing contention as core counts rise.

For memory‑intensive CAE workloads—particularly large Fluent models and selected LS‑DYNA cases—higher memory throughput can deliver performance gains that rival or exceed those achieved through increases in CPU frequency or core count. The impact of MRDIMM is workload‑dependent but becomes increasingly significant as model sizes grow and memory traffic dominates solver execution.

Figure 1. MRDIMM Multiplex Functionality

Lenovo ThinkSystem Servers for Ansys CAE Solutions

Ansys CAE solvers, Intel® Xeon® processors, and high‑bandwidth memory technologies together define the computational foundation for modern engineering simulation. Realizing their full potential, however, requires a system platform engineered to sustain performance under real‑world solver behavior rather than peak theoretical conditions.

Lenovo ThinkSystem servers integrate compute, memory, cooling, and power delivery into a balanced platform, enabling a complete, production‑ready solution for Ansys CAE workloads ranging from single‑node simulations to large‑scale, memory‑intensive HPC environments.

Topics in this section:

ThinkSystem SR630 V4 Server: The Traditional CAE Foundation
Transitioning Direct Water-Cooling Technology with ThinkSystem SC750 V4 Servers

ThinkSystem SR630 V4 Server: The Traditional CAE Foundation

Figure 2. Lenovo ThinkSystem SR630 V4 Server – 2-socket, 1U, Air-Cooled Rack Server

Transitioning Direct Water-Cooling Technology with ThinkSystem SC750 V4 Servers

As CAE workloads scale in model size, solver concurrency, and memory bandwidth demand, processor power envelopes and thermal requirements increasingly exceed what air‑cooled platforms can consistently sustain.

The Lenovo ThinkSystem SC750 V4 addresses these challenges through direct water cooling (DWC), removing thermal constraints that can otherwise limit sustained CPU frequency and memory performance under continuous load. By enabling stable operation of high‑core‑count Intel Xeon 6 AP‑class processors and high‑bandwidth memory configurations such as MRDIMMs, the SC750 V4 allows systems to operate closer to their architectural limits for extended runtimes.

This capability is especially critical for large‑scale CFD, implicit FEA, and explicit dynamics workloads, where sustained performance directly translates to improved scalability and reduced time‑to‑solution.

Figure 3. Lenovo ThinkSystem N1380 Neptune Chassis – 13U8T Enclosure of 19” Rack Cabinets

Figure 4. Lenovo ThinkSystem SC750 V4 Neptune Server Tray

Performance Results

The performance results presented in this section are based on application‑level benchmarking of Fluent, Mechanical, and LS‑DYNA software using representative workloads and production‑class server configurations.

Testing was conducted on Lenovo ThinkSystem platforms powered by Intel® Xeon® 5th and 6th Generation processors to evaluate generation‑over‑generation performance, platform behavior, the impact of memory technology, and scaling characteristics under realistic CAE usage scenarios.

Topics in this section:

Test Methodology
Benchmark Systems
Fluent Software Performance Results
LS-DYNA Software Performance Results
Mechanical Software Performance Results

Test Methodology

Benchmarks were designed to reflect common CAE deployment models and include both single‑node performance measurements and multi‑node scaling studies. Single‑node testing focuses on isolating CPU and memory subsystem behavior, highlighting the effects of processor architecture, core count, sustained frequency, cache capacity, and memory bandwidth without the influence of network variability.

Multi‑node scaling tests evaluate how solver performance evolves as workloads are distributed across multiple compute nodes, characterizing parallel efficiency, communication behavior, and sensitivity to system balance as aggregate core counts and memory bandwidth increase.

All systems were configured with consistent software environments, including operating system, compilers, MPI libraries, and Ansys solver versions, to ensure comparability across platforms. Performance results are reported as relative performance, normalized to an Intel Xeon 5th Generation baseline configuration. This approach highlights architectural and platform differences while avoiding dependence on absolute timing values, which can vary with model configuration and solver settings.

Benchmark Systems

The benchmark systems are detailed in the table below.

Table 1. Specification of Benchmark Systems
Processor	Intel Xeon 5th Gen 8562Y+	Intel Xeon 5th Gen 8592+	Intel Xeon 6732P	Intel Xeon 6960P	Intel Xeon 6972P	Intel Xeon 6980P
Core per socket	32	64	32	72	96	128
Max turbo frequency	4.1GHz	3.9GHz	4.3GHz	3.9GHz	3.9GHz	3.9GHz
Base frequency	2.8GHz	1.9GHz	3.8GHz	2.7GHz	2.4GHz	2.0GHz
Cache	60MB	320MB	144MB	432MB	480MB	504MB
Thermal design power (TDP)	300W	350W	350W	500W	500W	500W
Ultra Path Interconnect (UPI) speed	20GT/s	20GT/s	24GT/s	24GT/s	24GT/s	24GT/s
Memory Channels	8	8	8	12	12	12
Memory Type	DDR5	DDR5	DDR5	DDR5	DDR5	DDR5 & MRDIMM
Memory Speed	5600MT/s	5600MT/s	6400MT/s	6400MT/s	6400MT/s	6400MT/s & 8800 MT/s
Installed Memory	512GB	512GB	1024GB	768GB	768GB	1536GB & 768GB
ThinkSystem Server(s)	SD650 V3	SD650 V3	SR630 V4	SC750 V4	SC750 V4	SC750 V4
Network Interconnect	NVIDIA NDR400	NVIDIA NDR400	NVIDIA NDR400	NVIDIA NDR400	NVIDIA NDR400	NVIDIA NDR400

The benchmark systems used the following configuration:

Operating System: Rocky Linux 9.6
Kernel Level: 5.14.0-570.58.1.el9_6.x86_64
Intel OneAPI: 2025.2.1
Ansys: 2025.2
LS-DYNA: 16.1.1

Fluent Software Performance Results

Fluent workloads represent a broad class of CFD simulations with varying computational characteristics depending on model size, mesh resolution, and physical complexity. Fluent exhibits a balanced dependence on CPU frequency, core count, cache capacity, and memory bandwidth. The following results illustrate how these factors interact across Intel Xeon 5 and Intel Xeon 6 platforms.

Topics in this section:

Single Node Performance
Impact of Memory Configuration
Multi-Node Scaling Performance

Single Node Performance

Single‑node Fluent software performance results highlight the combined impact of processor architecture and memory subsystem capability. For smaller and moderately sized models, performance improvements are driven primarily by per‑core efficiency and sustained CPU frequency, while larger models increasingly become constrained by aggregate memory bandwidth.

Across the evaluated configurations, Intel Xeon® 6 SP‑class processors deliver approximately 8–12% higher per‑core performance compared to Intel Xeon® 5th Generation processors for Fluent workloads (Figure 6). This improvement reflects architectural gains in IPC, cache behavior, and sustained frequency rather than increases in core count alone.

Moving to Xeon® 6 AP‑class processors, single‑node Fluent software performance increases by approximately 60–110% relative to Intel Xeon® 5th Generation platforms, depending on model size (Figure 7). These gains are driven by a combination of higher core density, substantially larger cache capacity, and increased aggregate memory bandwidth. Larger CFD models show the highest uplift, reinforcing the sensitivity of Fluent software to memory throughput as working sets exceed cache capacity.

Figure 5. Ansys Fluent Performance: Intel Xeon 5th Gen Processor & 6th Gen SP Processor

Figure 6. Ansys Fluent Performance: Intel Xeon 5th Gen Processor & 6th Gen AP Processor

Impact of Memory Configuration

Memory configuration plays a significant role in single‑node Fluent software performance, particularly for large and memory‑intensive models. When comparing DDR5 RDIMM and MRDIMM configurations on Intel Xeon® 6 AP platforms, MRDIMMs deliver an additional 6–10% improvement for smaller Fluent cases and up to 20–25% improvement for the largest models.

This behavior reflects the increasing reliance of Fluent software on memory bandwidth as mesh sizes grow. For smaller workloads, performance remains partially cache‑resident and sees limited benefit from higher memory throughput, while large CFD cases experience substantial gains once memory traffic becomes the dominant limiting factor.

Figure 7. Ansys Fluent Performance: Impact of Memory Technology

Multi-Node Scaling Performance

Multi‑node scaling results demonstrate the ability of Fluent software to efficiently distribute large CFD workloads across multiple compute nodes when sufficient compute and memory resources are available. For the largest models evaluated, Fluent software exhibits near‑linear scaling up to 4,096 cores, achieving approximately 15× performance improvement when scaling from 256 to 4,096 cores.

These results highlight the suitability of Fluent software for clustered HPC environments and underscore the importance of balanced node design. High per‑node performance combined with high‑bandwidth memory and a low‑latency interconnect enables Fluent software to maintain strong parallel efficiency at scale, directly translating into reduced time‑to‑solution for large CFD simulations.

Figure 8. Ansys Fluent Performance: Scaling on Intel Xeon6 AP Processors

LS-DYNA Software Performance Results

LS-DYNA software advances the solution through very small time steps using explicit integration methods, resulting in a large number of repeated computational operations. These characteristics make LS-DYNA particularly sensitive to core count, sustained CPU throughput, and parallel efficiency, while still benefiting from adequate memory bandwidth for large models.

Topics in this section:

Single Node Performance
Impact of Memory Configuration
Multi-Node Scaling Performance