Introduction to Google Cloud Platform
Table of contents:
Google Cloud Platform (GCP) offers a wide range of services for securing, storing, serving and analyzing data. These cloud services form a secure cloud perimeter for data, where different operations and transformations can be carried out on the data without it ever leaving the cloud ecosystem.
GCP is a simple, yet powerful, and cost effective cloud option for building large-scale machine learning models. It boasts a rich set of products to simplify the process of performaing large-scale data analytics, model training and model deployment for inference on the cloud.
The Computational Cost of Building ML Products
- Long training times: Running a suite of an experiment on a decent CPU (e.g., a QuadCore i7, with 8GB RAM) can take upwards of 3 hours to days and even weeks for the algorithms to converge and produce a result set.
- ML modeling is iterative: This computational lag is especially dire because getting a decent result requires several iterations of experiments either to tune the different parameters of the algorithm or to carry out various forms of feature engineering to achieve the desired classifier/ model that generalizes “optimally” to new examples.
- High-performant computer hardware is expensive: On-premise high-end machines are expensive. Moreover, the technical skills required to build a cluster of commodity machines running a Spark/Hadoop cluster might be overwhelming and even sometimes a distraction from the ML task.
Why GCP?
- Technology leadership: Google is a top technology leader in the internet space with a range of top web products such as Gmail, Youtube, and Google Maps to mention just a few. The aforementioned products generate, store and process tons of Terabytes of data each day from internet users around the world.
- Taming big data: To deal with this significant data, Google have massive investments in processing and storage research and infrastructure. Google as of today boasts some of the most impressive data center designs in the world to support their computational demands and computing services.
- High-speed computation: Google Cloud Platform makes available to the public lighting fast computational speed (it is getting faster!) and high-tech storage capabilities with extremely low latency (meaning minimal delays in data transfer) and high throughput (can be naively described as the time taken to complete a job). This is all glued together by state of the art networking technology/ infrastructure.
- Ease of use: The storage and processing platform on which are built products like Gmail, Google Docs and the like, are now accessible to the public and available for everyone to utilize.
GCP Product and Service Offerings
Cloud Compute
Virtual machines running on Google’s data centers around the world. They include:
- Compute engine: virtual computing instances for custom processing.
- App engine: a cloud managed platform for developing and deploying web, mobile, and IoT app.
- Kubernetes engine: orchestration manager for custom docker containers based on Kubernetes.
- Container registry: private container storage.
- Serverless cloud functions: cloud-based functions to connect or extend cloud services.
Cloud Storage
Provide scalable and high-availability storage options for live and archival data within the cloud perimeter. Cloud storage is set-up to cater for elastic storage demands. The cloud storage products include:
- Cloud storage: general purpose storage platform).
- Cloud SQL: cloud-managed MySQL and Postgre SQL.
- Cloud BigTable: NoSQL petabyte-sized storage.
- Cloud Spanner: scalable/ high availability transactional storage.
- Cloud Datastore: transactional NoSQL database.
- Persistent disks: block-storage for Virtual Machines.
Big Data/ Analytics
Offers a range of serverless big data and analytics solutions for data warehousing, stream, and batch analytics, cloud-managed Hadoop ecosystems, cloud-based messaging systems and data exploration. The big-data services include:
- Cloud BigQuery: serverless analytics/ data warehousing platform.
- Cloud Dataproc: fully-managed Hadoop/ Apache Spark infrastructure.
- Cloud Dataflow: Batch/ Stream data transformation/ processing.
- Cloud Dataprep: serverless infrastructure for cleaning unstructured/ structured data for analytics.
- Cloud Datastudio: data visualization/ report dashboards.
- Cloud Datalab: managed Jupyter notebook for machine learning/ data analytics.
- Cloud Pub/Sub: serverless messaging infrastructure.
Cloud AI
Leverage pre-trained models for custom artificial intelligence tasks through the use of REST APIs. This is the same technoogy stack used by Google applications such as Google Translate, and Photos. Google Cloud AI services include:
- Cloud AutoML: train custom machine learning models leveraging transfer learning.
- Cloud Machine Learning Engine: for large-scale distributed training and deployment of machine learning models.
- Cloud Natural Language: extract/ analyze text from documents.
- Cloud Speech API: transcribe audio to text.
- Cloud Vision API: classification/ segmentation of images.
- Cloud Translate API: translate from one language to another.
- Cloud Video Intelligence API: extract metadata from video files.