I cannot remember the last time I met CTOs within our Dell EMC OEM customer community who were not focused on making their OEM products more intelligent and data-driven. The reality is that if you’re not thinking about data and the “digital footprint” of your product, you’re in the minority.
Value in Data
This emphasis on data is increasingly reflected in the organizational structure of companies. Today, we see expert roles dedicated to data strategy, AI and IoT. The constant quest is to find value, meaning and efficiencies in data where previously, due to technical or operational constraints, it was, quite simply, unfeasible to unleash the genie from the bottle.
And yet, while the technology and the means to acquire and analyze data, are maturing, we still face complexity. The problem is that access to data and approaches to analysis remain inconsistent across industries and institutions.
For example, take healthcare, specifically Medical Imaging. Arguably, this is an ideal candidate for the application of Artificial Intelligence. The idea that a doctor or surgeon studying patient images and looking at risk markers based on the results of previous research studies, can benefit from a global, near real-time advisory system is incredibly powerful.
If implemented correctly, this could augment our healthcare system in a profound and meaningful way, saving thousands of lives in the process. It seems like such an obvious no-brainer, right? So, why aren’t hospitals and health care systems rushing to implement it straight away?
The issue here is largely data governance. There is no global repository of patient imaging data that experts in the field of data science can just point their medical imaging model to. Data is siloed in different countries, regions and hospitals. Connecting data together is often impossible due to legal issues.
Even if that wasn’t a factor, trying to centralize the vast quantities of data would pose tough technical challenges. This is not to say that great breakthroughs in the field aren’t happening; they absolutely are. Nonetheless, data starvation is an issue and will continue to be the case, as long as we consider the centralized model to be the only option.
A similar issue exists in training autonomous vehicles. In this case, the sheer volume of data needed to train models in decision-making within a suitable margin of error is incredibly large. Even if we were to overcome the issue of centralizing data for the initial model creation, how do we continue to improve the model? How do we harness the data that continues to be created by vehicles in the field without having to share potentially sensitive and cumbersome raw data back to a central location?
Centralization vs. the Edge?
And so, if centralization isn’t possible, can we look to the Edge? What about deploying more compute resources to process the data close to the point of creation without bringing it back to the datacentre? There are numerous benefits here, not least latency and networking complexity. This could admittedly fix one problem, but it could also create a slew of others.
For example, in the medical imaging scenario, this would create more silos, with each hospital having its own algorithm. As the larger hospitals would naturally have access to more data, they could develop more accurate models while the smaller ones would continue to be starved. In short, there would be no guarantee of “consensus” across the models.
Federated Machine Learning
One area that I’m excited about that could help resolve all these issues is Federated Machine Learning. The concept behind Federated Learning (FL) is that the continuous training of the model is disaggregated from the datacentre; instead, it is distributed across nodes that sit in different locations or institutions.
Best of Both Worlds
For example, each hospital would take an initial shared model from a central server and continue to train that model, using its own dataset in isolation before sending an updated model back to the central server. The central server would then take updates from all the hospitals and aggregate the changes, improving and calibrating the original model before re-distribution to the nodes across the various sites.
The beauty of this approach is that raw (patient) data is isolated and never shared between different sites or even between the nodes and the central server. The network bandwidth requirements are exponentially smaller. The algorithm becomes “democratically elected”, with the data from each hospital contributing its part. Weighting can be applied in order to ensure that the largest datasets are given the highest priority.
Test results so far are promising. In a 2018 experiment carried out by Intel and the University of Pennsylvania, a Federated Machine Learning model achieved 99 percent of the model performance of a model created with shared data.
It’s an evolving area of research, but it’s not difficult to imagine how beneficial this could be if architected properly with adequate attention to security. There are some interesting Open Source projects underway currently that use block chain technology for this purpose.
Improving Performance without Sharing Raw Data
By using Federated Machine Learning, I believe that those who are designing smarter products of all types can continuously update and improve the behavior and performance of the decision engines without ever needing to share raw data. Think about the potential for Federated Machine Learning in the security and automotive industries plus a multitude of IoT use cases!
Appliances deployed across multiple customer locations could individually contribute to improving the products in which customers have invested. Coupled with powerful Edge computing hardware, this method could prove instrumental in helping OEMs bring their products to a smarter future.
Are you trying to find new ways to make your products more data-driven? Do you have insights to share? I would love to hear your comments and questions.
Learn more about Dell EMC OEM Solutions here.
Join our LinkedIn OEM & IoT Solutions Showcase page here.
Keep in touch. Follow us on Twitter @dellemcoem