Hello and welcome to the Next Horizon, a Dell Technologies podcast. I’m Bill Pfeifer, and together we’ll be talking about emerging technologies, their potential to impact society and what you need to know today.
Our guest today is Matt Wallace. Matt is the CTO of a cloud storage company called Faction, which provides a pretty amazing set of cloud accessible storage options. We connected with Matt through the Dell Technologies Capital team who were interested in what Faction was doing and the way that it merges data center storage capabilities with cloud-based access. So Matt, welcome to The Next Horizon. We really appreciate you spending some time with us to talk through your business. And I think I speak for everyone listening when I say we’re really looking forward to learning a little more about what you and Faction have been up to.
Thanks. Super excited to be here today. I think that there’s a lot to talk through in this space and even just thinking about the introduction and the characterization of Faction as a cloud storage company, it is true, but it’s only part of the story.
I think one of the really interesting things about the evolving part of this space is that there are a lot of folks who are working to address cloud storage, multicloud storage challenges. And yet one of the things that we’ve found is that there’s something unique in Faction’s DNA, our long history operating private cloud environments, working on our multicloud platform more broadly meant that because we had this deep experience in running compute at scale in dealing with complex network technologies and we found that when it came to cloud storage, the storage piece was necessary but not sufficient. And by layering in a lot of these other technologies that we’ve worked on, we’ve actually been able to build something that’s a lot more interesting from a capabilities perspective and a lot more unbounded in a future, and we hope keeps pace with our customer’s emerging and ever changing technology needs.
So even more of a journey than I initially thought. That’s kind of cool. And the best thing about the emerging technology space and the innovation space is it keeps going, new things keep emerging and we keep understanding them to a deeper level. And as we’re innovating, we’re covering new territory. Always fun to have a conversation around that stuff.
Now, in our initial discussion, you mentioned this idea of data gravity. It’s an interesting problem that’s been on the rise a lot recently, I’m hearing more and more about it, and many of our customers are struggling with it, and those that aren’t will probably start to trip over it pretty soon. Can you tell our listeners a bit about what data gravity problems are and why they bother you, more importantly, your customers?
I guess to start off with we should get level set on a definition around data gravity, although I think most people have encountered it conceptually. But simply put, it’s this idea that as you assemble data, whether it’s in your applications or data lakes, that that data attracts other data, that it attracts applications. And the reason why is you can think about this from a database perspective, if you run a database on very fast storage, it’s very easy to access, you can get more data into and out of it very quickly. If it’s on very slow storage, then it’s going to take much more time to perform any sort of query. In this case, rather than it being the storage, it’s actually the network latency that induces that delay. And so data gravity is this tendency to want to have different sorts of applications get closer and closer together so that they can work together much more quickly.
So I think the interesting thing about the way customers are dealing with data gravity concerns today is oftentimes they’ll have an existing IT environment, existing data centers, one or more, but they’re sometimes years, sometimes months into some sort of cloud transformation, digital transformation effort, so they’ll have either new applications they’ve built in the cloud or existing applications they’ve replatformed to the cloud.
And the thing that they find is challenging is they’ll have datasets that live in an on-prem data center and they want to access those datasets from those cloud-based applications or vice versa, and it’s the latency over the WAN that they have with those applications that really causes a problem for them. That additional latency can be a real killer for services talking to one another and processing data. This has been exacerbated by the sort of transformation to microservices. So as you see people deploy more and more applications into container environments using microservices architectures, because microservices tend to have this service mesh concept where you call into one service and then they call many other services to serve a request out to an end user, that can amplify that data gravity problem if any of those requests has to traverse a WAN. It becomes problematic. So there’s this real tendency to want to get the apps and the data closer and closer together. And the more data there is, the stronger that becomes, and part of that is because it just becomes difficult to move the data, it becomes more costly, it becomes slower because, the truth is bandwidth has not remotely kept up with the growth in data and compute power.
So it’s going to be really interesting to see how containerization takes off and the changes that that drives in particular, because containerization allows so much more mobility to your point between clouds, between data centers and clouds, private and public clouds. So we’re going to have containers moving all over the place, but the data doesn’t move nearly as fast as those tiny little containers. So then you have these microservices meshes where you don’t necessarily have to be aware of all the different services that are being called, you’ll just casually spin up these containers and hopefully you have the data there. But if you don’t, that’s going to be an interesting challenge.
Yeah, it can actually be really notable with containers, almost dramatic if you will, because the container start time can be so short. There’s been research projects where they had containers literally turning up in response to a web request. So you have a request come in, the container that needs to respond to that request literally doesn’t exist and it gets started and then catches the request on the fly because they can really actually be live that fast. So much different from a traditional bare metal hardware or virtual machine type of environment.
The problem you have with containers, of course, is so they can turn up in a fraction of a second, that really exacerbates the pain of having to move data around. This is actually one of the most fun things we’ve been working on is the predominant platform for managing containers nowadays is Kubernetes. Everybody’s heard about it. A lot of people are wrestling with it. Once you have that environment though, one of the things that you have in Kubernetes is this idea of a persistent volume claim, so containers are designed to be destroyed and created at the push of a button on the fly, potentially dozens or hundreds of them spawning and de-spawning in response to changing conditions or batch workloads, you name it.
The difficulty you have though is that containers, generally speaking, are not designed to have persistent data. They are ephemeral because they want to be able to scale out horizontally. So this persistent volume claim is this idea that a container that’s turning up can connect remotely to this data that will outlive the container. And so one of the powerful things we’ve been able to do is locate those persistent volume claims on Faction storage platforms so that we can actually take a container that’s running inside of Amazon on AWS and it can essentially power down, and then in the blink of an eye, a replacement container could turn up in Azure. And that container in Azure could attach to that same persistent data store. And so imagine being able to move something like a database from Amazon to Azure in the tiniest fraction of a second, like less than a 10th of a second to actually move the running workload between clouds. And that’s possible today because of the kind of merging of that container technology with our multicloud platform. It’s really cool stuff.
It seems like that really helps us get the full potential out of our data centers and the magic that is public cloud and tie them all together. That sounds really cool.
So the data hosted by Faction is accessible by the public cloud, but it sits outside of it so you’re more in direct control of it? That’s a pretty fundamental shift in how customers can use that data. Have you seen instances where customers successfully accessed their data from multiple clouds at the same time? What sorts of things do you see your customers typically doing?
Yeah, good question, and I think there’s two parts of this. One is what is the relationship between the data they have on prem versus the data they put in a Faction service and how do they consume that from the cloud? Because one of the things that we enable fundamentally, living on top of the Dell Technologies platform, being able to inter-operate with existing investments that folks have in on-prem storage, if they have Unity, they have PowerMax, if they have Isilon, if they have ECS, they can essentially extend that hardware platform choice and the technology associated with it, including its features and functions that are unique to it, up into the Faction services and then make those same services available into the public cloud.
And then there’s the question of with the data there, setting aside replicating from on prem and back to on prem, making that data flow back and forth, then what are you doing with that data in the cloud? The data in the cloud, it’s a wide open question, certainly. There’s many different use cases, and I’ll just use a few.
One of our clients is an oil and gas firm. They do petroleum services. They essentially have multiple petabytes of data, they’ve accumulated it from seismic surveys, and so it’s one of those things where they have machine learning algorithms that are used to continuously look at these seismic surveys, things like seabed floors that are scanned with sonar and other things. And they use this to decide where they should invest and actually making a decision like, should we drill here or drill there? They’re able to use this big data analysis on that data. And one of the things they wanted to do is continuously keep it running. So they had a copy of all this data that they had in on premise data center, where they run applications typically, but they also wanted some help with disaster recovery. So Faction operates a whole set of services under the umbrella of hybrid disaster recovery as a service, and it’s essentially our way of blending best of breed between VMware-based recovery and public cloud recovery, getting the best of both worlds.
But the interesting thing is with these petabytes of data, they had a very unique challenge, and so by pulling in this large set of unstructured data we’re able to recover their applications at the VMware layer using our services plus VMware cloud on AWS, but we’re able to pair that with this unstructured data copy so that they can actually use the full unstructured data set during that DR [inaudible 00:10:30]. But now here’s where things get interesting because once you’ve made a copy of data and you’re replicating to a second site, DR doesn’t need to be the only use case if it’s adjacent to the cloud. So now that they have this second copy, they’re able to kill two birds with one stone and they can actually turn up and do turn up some of these big data analysis tools to gain additional scale and essentially burst to the cloud with that extra copy of the seismic data by running some of that analysis in the public clouds. And of course, that’s capacity that they can turn up and turn down on demand just because they have that extra copy of the data there.
Another example that’s in a very similar vein is genomic analysis. So we’ve been working to, I guess I would say optimize a genomic analysis pipeline for multicloud. And again, this is one of those things that can inter-operate with a traditional data center and it can sit on its own, and we’ve done both of those, but in this particular case we have folks who are essentially interested in being able to do a genomic analysis on these large genomes, and just a small data set that we’re using in a lab environment is 70 terabytes of data, but one of the interesting things about genomic analysis is it’s actually shifted now towards GPUs. Plenty of CPU work still being done, but there’s a company that Nvidia just bought that actually helps accelerate these genomic pipelines using GPUs. And you find that the availability of the GPUs across the public clouds, as well as which GPU instances are available at given time, how many are available as spot instances, et cetera varies between the public cloud providers.
What we set out to do this recent test, for example, we found in one region where Amazon and Azure and Google were all present, one of them had the GP that we wanted to use, one of them had an older generation of the GPU, which was still good, and then one of them didn’t have the type of GPU that we wanted at all. And so that sort of feature arbitrage between the public clouds and just knowing if there’s a new GPU that comes out, one of them is likely to install it and I can switch immediately using that cloud is powerful.
And then on top of that, of course, there’s the simple, how many spot instances can you get in a particular cloud question? So if you have access to three clouds, you’re much more likely to get the spot instance capacity that you need where you can just turn up a workload at a very low cost because you’re essentially only using the unutilized capacity from that particular public cloud provider. You get the advantage of the lower cost of that compute for these big patchwork workloads as well. We’re seeing use cases like that across industries.
I will say at the end of the day though, some of this doesn’t necessarily have to be at huge scale to be relevant. If you just take any enterprise where applications are run across different development teams, and those development teams have autonomy of which public cloud they want to use, team A may go build an application in Amazon because they want to take advantage of an Amazon service, and team B may build an application in Azure because they want to take advantage of an Azure service. The challenge is, what are you doing when they both want access to the same dataset? And so of course, you can wrestle with the challenge of which cloud has the authoritative copy, how do I synchronize it, what’s my egress charge going to be, and all those sorts of questions. And in a lot of cases, we find the data is ultimately coming from somewhere outside of the clouds. It’s coming from the company’s branch offices, their suppliers, IOT sensors, you name it.
So there’s a question of all of those things you have to surmount or you leverage a service like Faction, you get that data sitting between the clouds and now all the development teams can just access that data. So broadly, that sort of innovation driven, how do we let our development teams do what they want to do so they can go as fast as they want to go and be as agile as they can be without having to worry about all these data synchronization problems, is also a whole problem class that we’re solving for.
Wow. So that was a heck of an answer. In that answer, let me paraphrase here, you fixed the problem of data gravity, you changed the whole landscape of disaster recovery, you’re enabling customers to engage in feature arbitrage between clouds and you’re giving them the power and flexibility to do dynamic cost optimization between their on prem data center and all the different public clouds. That’s pretty amazing.
And what are we going to do with the rest of the week, right?
Yeah. Yeah, what’s coming next? I mean, what else do you got?
Yeah, I will say you mentioned at the start this idea of innovation driving innovation, and I know that is so deep in the Faction DNA. I get really excited. And as the CTO occasionally I get to have this super futuristic talk, and when I start thinking about all these next generation use cases, things you can do with IOT sensors now that 5G is becoming a reality, you start looking at both personal industrial use cases for things like augmented reality and what that means for people and how they’re going to go about their day, things about conversational AI and just the way people interact with their Alexa device today. But what does that look like tomorrow? What does the next generation of those type of applications and those interactions look like? Because these sort of AI applications are getting so much smarter so rapidly, and so I think we get really excited about how we help enable those use cases, for sure.
So one of my favorite questions is always, what comes next, and you just answered that because innovation is just so completely baked into everything that you do. I love it. There’s a whole lot of next territory that we can cover there. That’s what I love about hosting this podcast is we’re always talking to folks who see different versions of what are the problems today and how do we solve them tomorrow and what comes next.
Yeah. I think it’s interesting too because there’s a particular place, we talked about data gravity, and there is a particular leading edge of innovation that everybody’s going to be aware of right now, which is this 5G revolution. Everybody is hearing about 5G, they’re starting to shop for 5G phones, definitely a revolutionary amount of bandwidth, and the idea that you’re going to be able to use it for use cases like not owning a console anymore for gaming, but having Microsoft for example, stream a game to you over the air. I tried to imagine playing an X-Box equivalent game from a train on a tablet while I’m zipping along connected via 5G and that sort of thing is actually a real possibility. But it really emphasizes and highlights what we were talking about data gravity because you now really have this spectrum, these silos if you will, there’s this edge area where there’s tons of information going to and from users and of course consumers, with their 5G devices is one, things like doctors interacting with patients in medical offices is another. And then a lot of that data historically interacts with things that are going on in that classic enterprise data center that you built for your specific needs.
And then of course, the third silo is that cloud silo, agile development, your developers building applications, leveraging cloud services for additional insight. And really I think a lot of our challenge is to enable innovation to spread across that spectrum fluidly so you can deal with things that happen at the edge, things that happen at the data center, things that happen in a cloud, and really just get rid of the barriers to getting data to move back and forth, to flow freely, to enable applications and ultimately to enable those things that really just make our lives better.
So that brings up an interesting question of privacy and security. As we create all of this new data, as we move it around more fluidly, as we’re shifting between environments, there’ve always been concerns about hosting data outside of your own data centers, now we’re going to have more and more places where that data lives, so as we all get used to having third party data management options and the regulations start to catch up, how does that change the way people think about and execute data privacy, data security and data ownership?
Yeah, that becomes a really interesting question in the context, both of what we’re doing and the way that people are really challenged to deal with sorts of compliance regulations that are floating out there. So one of the interesting things is one of our very first customers on our multicloud platform was a Fortune 100 financial firm, and I remember with a certain level of fondness, because we crossed over that and got to the other side, the level of scrutiny that we had to go through in terms of being able to really clearly articulate to their team what we built from a security and compliance standpoint.
And so the first question of the security and compliance question is really people looking to leverage a platform like ours and asking is my data secure? What are your credentials? How is the data protected technically as well as from a operational perspective? All built on top of this idea that all this data that is going to be stored in data centers that are tier three have highest standards for things like biometric access, security cameras to monitor things through a physical control’s perspective all the way into logical things, things like how do you segment off the networks? This is one of those places where Faction has really been able to bring a lot of peace of mind because most of our IP portfolio, the bulk of our patents, is actually based on segmentation and isolation and distinguishment of traffic, so we’re really well prepared for a world where you start having these complex platforms that take many customers and connect them to many end points across cloud providers and keeping all of those things separate and segmented.
But it’s interesting because we have this fairly lengthy white paper that goes through all of this and so it’s really interesting and creates a real challenge. When I start thinking about if I was an end customer and I was interested in implementing this type of multicloud strategy on my own and thinking about the things that you would have to go through, the barrier to entry to deliver the security and compliance that we deliver is enormous.
Now, there’s another side to this too, which is also equally interesting, which is, what about data sovereignty? So instead of scrutinizing us, it’s more what do the cloud providers do if I put data in them? Is that appropriate for my customer base, my use cases, my existing contracts, my regulatory environment in whatever jurisdiction I’m subject to? And so we have folks that are looking to us to store the data permanently on our platform because some set of requirements around operational processes, the architecture and knowing exactly where the data lives and how many copies are kept, actually forced them to want to keep that data away from just going into the cloud in the first place. So aside from the other multicloud benefits we talked about, they just liked the idea of being able to say, I can tell you exactly where my data lives on exactly what platform it lives on and understand that a lot more intimately than they can with some of the public cloud offerings.
So there’s a certain level of scrutiny that we’ve encountered where people are still unwilling to put certain types of data, and this is a pretty rare case. It doesn’t apply to the vast majority of folks. But for those that do, this is a real barrier to them in terms of cloud adoption that we’ve been able to help them get around by keeping the data in a much more definitively controlled environment.
Love it. So Matt, thank you for spending the time with us and giving us a view into what you and Faction have been working on. It’s a really exciting space that you’re working in and pretty cool to hear the way that you’re merging, what I would typically expect to see as data center specific offerings into public cloud to give your customers the best of both worlds and using that to solve a number of really complex problems.
Yeah, thank you. It’s actually been an amazing thing to work with folks like the Dell team where you actually get this ability to take these platforms where people have so much data, these existing investments, and then to wave our magic wand, if you will, and leverage our sort of network expertise and cloud expertise to literally take that technology and multicloudify is actually … It’s really satisfying because I do get a fair number of, “Wait, you can do what?” kind of responses. That’s a lot of fun.
So we’ll add multicloud to the list of other things that you’ve already solved, so I love it. So for those of you who enjoyed this podcast and want to know more about what’s happening with Faction, you can find more on their website at factioninc.com. That’s F-A-C-T-I-O-N-I-N-C.com. Of course, they have some overviews of their products and how they’re changing the cloud storage game for the better. And you can also find more information about the next horizon at www.delltechnologies.com/nexthorizon, including future podcasts and some great technology content that we’ve shared from the smart folks at our office of the CTO.
Thank you all for listening to the Next Horizon, a Dell Technologies podcast. We appreciate your time, interest and attention. I hope you’re as excited as we are about the great innovations that are coming out of cutting edge companies like Faction. Be sure to subscribe to the podcast, either through your favorite podcast app or through the website at www.dell technologies.com/nexthorizon so you don’t miss any great new content. And I look forward to seeing you again for upcoming episodes. I’m Bill Pfeifer, and this is the Next Horizon.