Architecture Matters

This blog – and several to follow – will focus on architecture, specifically architecture for scale-out storage. There are many lessons to be learned here, both from history and the current day. First, a short trip back in time – to a bridge nicknamed “Galloping Gertie”, to which many of my fellow Isilon employees, based in Seattle, probably already know the story. But for those of you who don’t…

On July 1st, 1940, a bridge over the Tacoma Narrows – the body of water between Tacoma, Washington and the Kitsap Peninsula – opened with great fanfare. It was the third longest suspension bridge in the world at that point. Its progenitors, the Washington State Toll Bridge Authority, were very proud of the fact that they saved 1.6 million dollars during its construction – spending only $6.4 million instead of the estimated $11 million, by changing the design and architecture of the suspension bridge from the engineering norm at the time (e.g. the Golden Gate Bridge).

On November 7th, 1940 – a mere five months later – the bridge collapsed. There is a classic photograph of the collapsing bridge here, as well as a video. Fortunately, the only loss of life was one dog – no people died. But to this day, Galloping Gertie teaches us a lesson.

That lesson is simple – one must, when considering the design of a significant structure, pay attention to detail and design systems correctly, without cutting corners or trying to shave costs from optimal design. The consequences, as the State of Washington found out, can be severe. So, you say, OK, I get it for bridges – but how does this apply to the topic at hand – scale-out storage?

It too is simple. In scale-out storage, architecture matters – and it matters a lot, so much so that if it’s wrong – nothing else matters. Consider the current scale-out approach we use; not just single namespace, but true single filesystem and true resource scale as nodes are added. The key difference between true scale-out design and those designs that appear to be scale-out – just as Galloping Gertie appeared to be a stable suspension bridge but in reality was flawed – is that the additional nodes contribute to all functions for all other nodes and all entities (files) within the architecture.

Put another way, consider flawed scale-out design, several instances of which are being marketed and sold today. Adding nodes to such designs only contributes resources to a small part of the system, most notably those files that are contained within that particular node (or pair of nodes, in pseudo-clustered designs). For example, if I add a pair of heads to a pseudo-cluster “scale-out” design, it allows additional capacity to be added, and additional performance – but that additional performance embodied in those heads is only applicable to the capacity under those heads, not the entire system. It cannot rightly be called scale-out. Even more importantly, adding a new layer of resources on top of an existing non-scale-out layer does not make the entire system scale-out. A good friend of mine recently dubbed this “window dressing”. It makes the existing merchandise look prettier, but upon close examination, it’s the same old, same old. Worst of all, just as one does at the high-end department stores, you pay for the window dressing in the form of increased prices.

Our design is true scale-out; all nodes contribute resources to the entire system – to all other nodes and all files across the entire system – in additive fashion. Caching is increased, bandwidth is increased, space (capacity) is increased, file protection is increased, and most importantly management remains the same – one filesystem with one namespace. That is the pivot point upon which scale-out file storage rests. Single namespace with multiple filesystems is not really scale-out; it appears to be, at one level, but upon closer examination – just as the engineers doing the post-mortem on Galloping Gertie found out – it doesn’t meet the requirements. It doesn’t pass muster. It is flawed design. It looks good standing still, but once a real-world load is applied, things become anything but normal.

In the weeks to come, I’ll blog more on this topic, in particular some of the finer architectural points of actual scale-out versus pseudo-scale-out. Stay tuned for that. In the meantime, watch the video of Galloping Gertie – and remember that history, without innovation, tends to repeat itself. The bridge engineers learned many design lessons from Galloping Gertie. We have also learned from the design mistakes of legacy architectures past and used design innovation to solve them. It is my hope is that consumers of scale-out storage take heed of those lessons and not confuse today’s architectures that appear to be scale-out for the real thing

About the Author: Robert Peglar