When it comes to making predictions using your trained model, you have two options:
- On the device itself (offline)
- Cloud computing (online)
Choosing the proper solution in this section is perhaps one of the most crucial decisions you’ll make when it comes to executing your machine learning, since it may have a significant influence on speed, power, privacy, and cost.
Each of these services comes with its own set of compromises. For example, if you choose the cloud option, the app must be connected to the internet in order to function, whereas if you do the predictions locally on the device, you are always constrained by hardware limitations and are unable to perform any type of machine learning due to RAM and CPU limitations.
- Offline implies that, in most circumstances, once you’ve used your training model to make predictions, you won’t be able to readily update it. When your model becomes out-of-date and stops working as it should, you must retrain it with more/newer data and then update the app to reflect the new model.
- Online – You don’t have to worry about retraining your model or how to publish the new model. Your model is updated on a regular basis. Furthermore, if you decide to retrain your model for whatever reason, it will not be an issue; you can just update the model on the cloud, and everyone will benefit.
On the device prediction
Whether or not you choose this option is entirely dependent on the circumstances of your usage. If your model isn’t too large or heavy, inference locally on the device may be an option to investigate.
Two of the most compelling reasons to utilize this technology are to ensure the privacy of the user’s data and to eliminate the necessity for a network connection.
But, in general, what are the reasons? It’s all broken down into three categories: cost-effectiveness, privacy, quickness.
- Cost-effectiveness: Because you utilize the user’s device to make predictions, you don’t have to pay for the cloud or any of the other extras that come with it. It will have a significant impact on large applications, and the number of requests will rise, but you will be able to keep costs low and maintain control.
- Privacy: This is mainly about the user’s data privacy because it is all handled within the device and no data will be uploaded anyplace. Data will never leave the device, to put it succinctly.
- Quickness: Because you don’t require network access to utilize this approach and all queries are sent locally on the device, speed is the key benefit. It is far quicker and more reliable than cloud inference, and you may use it with confidence. It also allows you to make real-time predictions on massive data types since you won’t have to transport the data over the network, which is difficult with cloud inference.
Pros and Cons
- It does not require any form of network access to function. It is significantly faster and, in certain situations, allows you to accomplish things that are difficult to achieve while using the cloud.
- You don’t have to deal with all of the issues on the server-side. When your app grows more popular, for example, you won’t need to scale up.
- The users’ information is secure.
- You don’t have to pay anything since you don’t use the cloud.
- There will always be hardware limits, and if your trained model is large, inference on the device is nearly impossible owing to the device’s performance limitations.
- The addition of the model to the software bundle will significantly increase the size of the program download, frequently by several gigabytes.
- Updating a model is typically quite complex. To take advantage of the new model, users must either upgrade their app or the app must download it automatically.
- Other developers have access to your app package. It’s straightforward to transfer over the learned parameters, and if you provide a TensorFlow graph definition or a caffemodel file, it’s much easier to steal the complete model.
- Other platforms are difficult to use. You must infer for each platform separately on other platforms.