Where edge computing breaks: operations

Let’s say your job is to control oil well operations in a small country. With each oil pump jack you have installed a device, the mechanism that pumps oil out of the ground from an existing well. This device monitors the local weather and pump operation. It even automates local processes on the pumpjack.

Together, these devices are known as edge computers. They have their own processors, local storage systems, operating systems and network interfaces that allow them to communicate with a centralized collection and analysis system. This centralized system uses artificial intelligence and data analytics to determine when to deploy human operators. For example, the device can determine when a pump motor is about to fail or when the oil flow is too high or too low.

The devices also use centralized data collection to monitor overall production and oversee all pump jackets producing oil. There are 500 devices in this particular edge computing network, one for each remote pumpjack, and all devices communicate back to a centralized system on a public cloud provider.

The first few months of using these edge computing devices to monitor remote and unattended pump operations went well. However, the storage systems on the devices soon began to fail due to a known bug, and network interfaces stopped working and had to be reset. Most of the time, some key sensors used on the pump jack stopped working. These problems could only be solved by sending people out to solve them, incurring costs that negate the purpose of using these devices to automate pumping operations.

To tackle this problem, remember that we need to approach edge computing like any other computing and storage platform under operational control: do the basics. Back up the data on any edge device, from remote to central. Update the operating systems and firmware remotely as you would on a smartphone. Support of application updates that contain changes to the data structure. Also keep track of configuration, including operating system releases, application updates and patches, and even the software versions running on some of the smart sensors.

In the oil well example, about 500 different hardware and software components are tracked for just one device controlling one pump jack. Don’t forget there are 500 pumpjacks. So there are 250,000 hardware and software components to track and operate.

The problem with edge computing comes when you look at how things really work. Yes, we have very solid, high-quality components for our edge computer, such as network interfaces, storage systems and processors, all of which can withstand environmental influences such as heat and humidity. However, if one of these components fails, many or most of the other components will also fail to do their jobs. For example, if the oil sensor stops working, if we don’t know the temperature of the oil, we won’t be able to troubleshoot the oil flow. If we can’t fix our oil flow problem, we’ll have to shut down the whole jack until someone can fix the sensor and restart production.

Similar issues can extend to medical device operations, remote factory operations, agriculture, or any scenario where you need to operate computer systems that are not easily accessible. The types of edge computing experiences described here are not uncommon today.

I suggest we plan a little more beforehand about how these kinds of edge computing systems should work. With the increasing use of edge computing, we need to understand how we use configuration management and operational systems and then think about how to deal with what we are likely to see in the field. We need to manage many of the same components, including their interdependence and the ability to handle different levels of failure. We also need to minimize the number of people needed to solve the problems and reduce the downtime of the edge computing systems and connected sensors.

Possible solutions come from a few different schools of thought. In many cases, the manufacturer or owner of the devices will develop a custom solution for the specific environment (such as the oilfield problems). Some promote the use of a redundant array of devices and sensors that can increase reliability to five 9s. Edge computing device platforms (computing, storage, and networking) typically cost less than $200 per unit. Why not use multiple in a redundant array? Ask the same question about sensors.

I predict that we will need some industry standards and best practices to make reliable Internet of Things systems a workable reality for edge systems running outside of data centers or other easily controlled environments. If everyone builds one-time solutions to meet their specific operational needs, nothing will be a real solution. Collaboration is needed between edge computing technology providers and industries. If we want to scale edge computing, we need to think innovatively first.

Copyright © 2022 IDG Communications, Inc.


Leave a Reply

Your email address will not be published.

Back to top