Performance at Scale: Clearing the Performance Bottleneck With EDA Storage

For years, a typical EDA infrastructure has relied on the same architecture for its storage system: A scale-up storage system characterized by a single-server operating system and a controller head with shelves of disks. The architecture creates islands of storage with many disk shelves and many separate controller heads.

The workflows, workloads, and infrastructure for chip design—combined with exponential data growth and the time-to-market sensitivity of the industry—constitute a clear requirement to optimize the system that stores EDA data.

Using a traditional scale-up storage architecture for EDA leads to an array of problems:

SCALABILITY AND PERFORMANCE BOTTLENECKS

Traditional EDA storage architectures create performance bottlenecks—bottlenecks that get worse at scale. A traditional scale-up architecture requires manual data migrations among storage components to balance performance for EDA applications. The controller is the main bottleneck: Although it is typical in EDA to limit the amount of capacity per controller head to make sure that performance requirements are met, attaching too much capacity to the controller can saturate it—a situation made worse by the fact that adding capacity does not scale performance.

Performance bottlenecks levied by the storage system can reduce wall-clock performance (turn-around time) for suites of concurrent jobs, which can affect the time it takes to bring a chip to market and, ultimately, revenue.

 UNWANTED ISLANDS OF STORAGE

To avoid saturation of a control head, engineers using typical scale-up storage are forced to create “islands of storage” – isolated storage clusters each with their own volume and namespace. Scale-up storage architectures can also add limitations to storage growth – such as requiring that newer technologies, such as all-flash performance storage, be installed on a separate volume. These limitations can create challenges for engineering since projects can be forced to span multiple volumes, and scripts may have to be modified to accommodate path changes. This also creates additional overhead for storage management.

INEFFICIENT UTILIZATION OF DISK SPACE

Capacity is unevenly utilized across islands of storage: Some volumes are underutilized while others are oversubscribed. The result is many volumes, all with pockets of free space. The uneven utilization forces you to manually rebalance volumes across aggregates and to manually migrate data to an even level. The burden of managing data across volumes and migrating data to distribute it evenly not only undermines performance but also increases operational expenses. From the engineering perspective, there is increased risk that a last minute request for space, while technically available in aggregate, may not be available within a single volume.

MULTIPLE POINTS OF MANAGEMENT

With no central point of management, each filer must be individually managed. The management overhead increases the total cost of ownership (TCO) along with OpEx. The multiple points of manual management also put EDA companies at a strategic disadvantage because the lack of centralized management undermines a business unit’s ability to expand storage to adapt to demand, which can in turn hamper efforts to reduce time-to-market.

With file server sprawl, the cost of managing fast-growing data can also exceed the IT budget by resulting in costly data migrations to eliminate hot spots. Similarly, backup and replication becomes increasingly complex and costly.

STORAGE UNCERTAINTY

Expanding data sets coupled with dynamic business models can make storage requirements difficult to forecast. Up-front purchasing decisions, based on business and data forecasts, can misestimate storage capacity. Forecasting capacity in advance of actual known needs undermines adaptability to changing business needs. With scale-up architectures, simply adding capacity causes a disruption – as does manually load-balancing control heads. Downtime increases project costs, directly causes delays in time-to-market and ultimately reduces profit margins.

Such a model of early storage provisioning can also lead to vendor lock-in and a loss of negotiation leverage that further increases costs.

PERFORMANCE (I/O) HOTSPOTS

Historically, CPU performance and quantity of cores was the primary bottleneck to EDA tools. When more performance was needed, semiconductor companies typically responded by adding cores. Today, however, we have faster networks and commodity servers available – with some compute grids growing to 50,000 cores and beyond. Even with as few as 1000 cores, however, the bottleneck shifts from compute to storage. EDA workflows tend to store large numbers of files in a deep and wide directory structure, which compounds the challenge around traditional storage infrastructures. For projects sharing the same controller or export, metadata-intensive workloads can saturate the controller CPU causing latency to spike and users are no longer able to interact with the storage system. For example, if an engineer kicks off 500 simulation jobs against /project/chip1_verif directory, anyone else interactively working in that space will notice delays in response.

Today’s compute grid is dense and extensive, leaving storage as the bottleneck. For many semiconductor companies, the storage system still has a legacy dual controller head architecture, such as a traditional scale-up NAS system, leaving EDA tools vulnerable to I/O hotspots.

Isilon Solutions for EDA Workloads

Isilon overcomes the problems that undermine traditional NAS systems by combining the three traditional layers of storage architecture—file system, volume manager, and data protection—into a scale-out NAS cluster with a distributed file system.

Such a scale-out architecture increases performance for concurrent jobs, improves disk utilization and storage efficiency to lower capital expenditures (CapEx), centralizes management to reduce operating expenses, and delivers strategic advantages to adapt to changing storage requirements and improve time-to-market.

For more information about Dell EMC Isilon NAS and how we are delivering performance, scalability and efficiency to optimize data storage for EDA, please read our latest white paper here.

About the Author: Robert Vo

Robert Vo has over 20 years of experience in HPC/EDA, having worked at Dell EMC, Broadcom Corp, and Micron Technology Inc. He has been in various roles from designing and supporting engineering compute infrastructure to debugging and optimizing CAD methodology. Robert is passionate in finding innovative solutions to unique EDA challenges. For fun and relaxation, Robert enjoys spending time with his family and working on home improvement projects.