Table of contents:

Computational Challenges of Training Learning Models

A significant challenge involved in training computers to learn using machine learning/ deep learning (ML/ DL) is computational power. Running a suite of an experiment on a decent CPU (e.g., a QuadCore i7, with 8GB RAM) can take upwards of 3 hours to days and even weeks for the algorithms to converge and produce a result set.

This computational lag is especially dire because getting a decent result requires several iterations of experiments either to tune the different parameters of the algorithm or to carry out various forms of feature engineering to achieve the desired classifier/ model that generalizes “optimally” to new examples.

The cost of on-premise high-end machines may be untenable for an aspiring ML/ DL practitioner, researcher, or amateur enthusiast. Moreover, the technical operations skill set required to build a cluster of commodity machines running Hadoop might be overwhelming and even sometimes a distraction for the practitioner who wants to delve into the nitty-gritty of ML/ DL.

Figure 1: Frustration of Long Processing Times

The Cloud to the Rescue

The cloud represents a large set of computers that are networked together in groups called data centers. These data centers are often distributed across multiple geographical locations. The size of a group, for example, is over 100, 000 sq ft (and those are the smaller sizes!).

Big companies like Google, Microsoft, Amazon & IBM, have large data centers for the public (i.e., both enterprise and personal users) to leverage at very reasonable prices.

Cloud technology/ infrastructure allows individuals to leverage the computing resources of big business for ML/ DL experimentation, design and development. To illustrate the efficiency of cloud computing, an algorithm with multiple test grids can execute for approximately 10 minutes, whereas the same algorithm can take over 10 hours or more on a local device. So, instead of running on a quad core machine for several hours, if not days, we can leverage thousands of cores to perform the same task for a short period and relinquish these resources after completing the job.

Another key advantage of using the cloud for ML/ DL is the cost-effectiveness. Imagine the cost of purchasing a high-end computational machine, which may or may not be performing the actual job of high-end supercomputing all the time. Alternatively, even consider the “cost” (both time and otherwise) of setting up an on-premise Hadoop infrastructure, which can need constant technical operations attention, and there is the danger of spending more time doing operations than actual analytics processing.

In all the scenarios presented, the cloud comes to the rescue, where thousands of CPUs are available on-demand for turbocharged computing at a very affordable price. The principle is, use the resources needed to get the job done, and relinquish them after use.

Figure 2: Datacenter illustration

Enter Google Cloud Platform

One of the big competitors in the cloud computing space is Google, with their cloud resource offering termed as “Google Cloud Platform” popularly referred to as GCP for short. Google Cloud Platform is a simple, yet powerful, and cost-effective cloud option for performing machine learning.

Google is one of the top technology leaders in the internet space with a variety of web products such as Gmail, Youtube, Google Hangouts, Google Maps, and Google+, to mention just a few. These products generate, store and process tons of Terabytes of data each day from internet users around the world.

To deal with this significant data, Google over the years have invested heavily in processing and storage research and infrastructure. Google as of today boasts some of the most impressive data center design and technology in the world to support their computational demands and computing services.

Google Cloud Platform makes available to the public lighting fast computational speed (it is getting faster!) and high-tech storage capabilities with extremely low latency (meaning minimal delays in data transfer) and high throughput (time taken to complete a job), combined with state-of-the-art networking technology/ infrastructure.

Figure 3: Google Cloud Platform