Welcome to another edition of Breakfast with ECS, a series where we take a look at issues related to cloud storage and ECS (Elastic Cloud Storage), EMC’s cloud-scale storage platform.
The trends towards increasing digitization of content and towards cloud based storage have been driving a rapid increase in the use of object storage throughout the IT industry. However, while it may seem that all applications are using Web-accessible REST interfaces on top of cloud based object storage, in reality, while new applications are largely being designed with this model, file based access models remain critical for a large proportion of the existing IT workflows.
Given the shift in the IT industry towards object based storage, why is file access still important? There are several reasons for this, but they boil down to two fundamental reasons:
- There exists a wealth of applications, both commercial and home-grown, that rely on file access, as it has been the dominant access paradigm for the past decade.
- It is not cost effective to update all of these applications and their workflows to use an object protocol. The data set managed by the application may not benefit from an object storage platform, or the file access semantics may be so deeply embedded in the application that the application would need a near rewrite to disentangle it from the file protocols.
What are the options?
The easiest option is to use a file-system protocol with an application that was designed with file access as its access paradigm.
ECS has supported file access natively since its inception, originally via its HDFS access method, and most recently via the NFS access method. While HDFS lacks certain features of true file system interfaces, the NFS access method has full support for applications and NFS clients are a standard part of any OS platform, thus making NFS the logical choice for file based application access.
Via NFS, applications gain access to the many benefits of ECS, including its scale-out performance, the ability to massively multi-thread reads and writes, the industry leading storage efficiencies, and the ability to support multi-protocol access, e.g. ingesting data from a legacy application via NFS while also supporting data access over S3 for newer, mobile application clients and thus supporting next generation workloads at a fraction of the cost of rearchitecting the complete application.
Read the NFS on ECS Overview and Performance White Paper for a high level summary of version 3 of NFS with ECS.
An alternative is to use a gateway or tiering solution to provide file access, such as CIFS-ECS, Isilon CloudPools, or third-party products like Panzura or Seven10. However, if ECS supports direct file-system access, why would an external gateway ever be useful? There are several reasons why this might make sense:
- An external solution will typically support a broader range of protocols, including things like CIFS, NFSv4, FTP, or other protocols that may be needed in the application environment.
- The application may be running in an environment where the access to the ECS is over a slow WAN link. A gateway will typically cache files locally, thereby shielding the applications from WAN limitations or outages while preserving the storage benefits of ECS.
- A gateway may implement features like compression, thereby either reducing WAN traffic to the ECS, thus providing direct cost savings on WAN transfer fees, or encryption, thus providing an additional level of security for the data transfers.
- While HTTP ports are typically open across corporate or data center firewalls, network ports for NAS (NFS, CIFS) protocols are normally blocked for external traffic. Some environments, therefore, may not allow direct file access to an ECS which is not in the local data center, though a gateway which provides file services locally and accesses ECS over HTTP would satisfy the corporate network policies.
So what’s the right answer?
The there is no one right answer; instead, the correct answer will depend on the specifics of the environment and of the characteristics of the application.
- How close is the application to the ECS? File system protocols work well over LANs and less well over WANs. For applications that are near the ECS, a gateway is an unnecessary additional hop on the data path, though gateways can give an application the experience of LAN local traffic even for a remote ECS.
- What are the application characteristics? For an application that makes many small changes to an individual file or a small set of files, a gateway can consolidate multiple such changes into a single write to ECS. For applications that more generally write new files or update existing files with relatively large updates (e.g. rewriting a PowerPoint presentation), a gateway may not provide much benefit.
- What is the future of the application? If the desire is to change the application architecture to a more modern paradigm, then files on ECS written via the file interface will continue to be accessible later as the application code is changed to use S3 or Swift. Gateways, on the other hand, often write data to ECS in a proprietary format, thereby making the transition to direct ECS access via REST protocols more difficult.
As should be clear, there is no one right answer for all applications. The flexibility of ECS, however, allows for some applications to use direct NFS access to ECS while other applications use a gateway, based on the characteristics of the individual applications.
If existing file based workflows were the reason for not investigating the benefits of an ECS object based solution, then rest assured that an ECS solution can address your file storage needs while still providing the many benefits of the industry’s premier object storage platform.