A lap around Generative AI Studio of Google Cloud

7 min readAug 21, 2023

Generative AI is the segment of AI which is the superset of deep learning with machine learning as the base foundational model. With Machine Learning platform supervised and unsupervised algorithms can be built before building the model and trained it. For complex input data, deep learning models can be used like NLP, Pytorch, Tensorflow, Keras etc. AI which augments the human intelligence and learns on the basis of reinforcement provides capabilities like computer vision, speech, Q&A maker & chatbot, video analysis etc.

Generative AI is the superset of AI platform which reinvents the human intelligence by enabling reinforcement learning providing help in content generation, text summarization, sentiment analysis, image and video generation etc.

Google Cloud as one of the pioneer of the Generative AI platform provides endless capabilities with its Vertex API bringing the power of Gen AI empowerment through image, speech and text generation.

Codey — Google AI platform enhances the seamless user experience for code generation from text to code foundational model. It’s empowered by PaLM2 API model of Vertex API.
Imagen- It’s text to image foundation model which enhances model performance embedding diffusion model which can change object’s color, classification, privacy based on input image and use cases.
Chirp — Speech to text foundation model which enables automated speech recognition over 100+ languages by engaging customers with native language support.

Vertex AI is the unified ML platform. It enables data engineers, architects, data scientists and ML engineers to ingest, prep and train models and deploy it quickly with optimized cost on google cloud platform.

In order to get started, first from Google Cloud console, under “Artificial Intelligence”, we’d need choose “Vertex AI”.

2. Next, on Vertex AI dashboard, we’d need enable “the recommended API” to get started with. Vertex AI API also has to be enabled for the specific project level. Also, the other recommended APIs are required to be enabled to get started with data preparation, collection, design and model building on Gen AI studio.

3. You can choose the google cloud region in which the Gen AI models can be build and deployed. Alternatively, you can start with Jupyter notebooks with workbench, model garden apart from Generative AI studio.

4. Vertex AI workbench integrates with JupyterLab which provides a single data science model development workflow platform. The notebooks can execute on GCP VM instances prepacked with JupyterLab.

the tags can be provided for the metadata generation of the data science workflow
The user managed or managed notebook instances can be provisioned with the respective environment, machine type, VM disks and networking, IAM and access control configuration.

5. Model Garden provides a centralized repository to browse, customize and build pre-trained models and deploy it for specific use cases either using google based or open source models. The Model garden provides the foundation models like

PaLM2 for text — It’s a fine-tuned to follow natural language instructions and is suitable for a variety of language tasks, such as: classification, extraction, summarization and content generation
Embeddings for text — Text embedding is an important NLP technique that converts textual data into numerical vectors that can be processed by machine learning algorithms and specially for large models.
Chirp — Chirp is a version of a Universal Speech Model that has over 2B parameters and can transcribe in over 100 languages in a single model.
Label Detector-Label Detector Zero-shot classifies images based on labels, represented as a list of text prompt strings provided by the user, and calculates the probability / confidence score of each label’s presence in the image.

Apart from these, Model garden also offers several fine tunable models and task specific solutions to get started with building large language models (LLMs).

Model Garden fine-tunable models and task specific solutions on Generative AI

6. As you click over the button “explore Generative AI” beside Model Garden on top-left, you’ll be landed to the Generative AI studio dashboard to get started Gen AI models customization and test.

7. There’re three solutions being offered currently on Generative AI studio of Google Cloud.

a) Language — Test, tune, and deploy generative AI language models. You can access the PaLM API for Chat for code generation, content generation, chat, summarization, and more.

b) Speech-Convert speech into text or synthesize speech from text using Google’s Universal Speech Model (USM)

c) Vision-Write text prompts to generate new images or generate new areas of an existing image.

8. To get started with Language model, you can start either with code prompt or text prompt based on the use case.

The text prompt are two types — freeform and structured.
Text prompts are based on “Text-bison(0001)”, “Text-bison(latest)” API of PaLM 2 model.
The text prompt parameters controls the degrees of tokenization value and consists of four parameters —

a) Temperature

b) token limit

c) top-K

d) top-P

The “safety filter threshold” parameter specifies how the responses you’d like to see which are harmful. The values can be defined as “block few”, “block most”, “block some”.

The Structured form provides capabilities providing the context for inputs, input question and output answer examples and test.

Structured Text prompts in Gen AI Studio Vertex AI Language Models

You can save the prompt while the respective prompt will be saved in that particular GCP region.

The chat prompts can help to create personalized virtual assistants to enhance user experiences and facilitates reinforcement learning.

Chat prompts in Vertex AI Language models

Model tuning (currently in preview as of date) provides proficiency in enhancing the existing models refining the existing models by fine tuning it based on use cases. You can specify the parameters like “train steps (no of steps to execute for training), learning rate multiplier (a multiplier which multiplies the recommended no of learning rate per model per tuning model). The model artifacts can be stored on google cloud storage and deployed over the regions.

9. The Vision API of Vertex AI provides capabilities towards generation of image captions based on input images. There’re maximum 3 captions can be generated per input images for Vision. As you click “Generate Caption”, the captions for the image will be shown.

You can generate images from text input as clicked on “Generate” button and create visual Q&A chatbot from the vision image analysis captions.

Visual Q&A chatbot on Vertex AI Vision API model on Gen AI Studio

10. The speech API of Vertex AI provides “text-to-speech” and “speech-to-text” offerings based on three different voice platforms — “English(female)”, “English(male)”, “Spanish(male)”. The speed value changes for speed factor of the speech.

Here goes an example of Vertex AI text-to-speech capabilities on Gen AI studio.

Speech to Text Capabilities on Google Cloud Vertex AI

Speech-to-text offers input speech in various locales like “english US(en-US)”, “english UK(en-GB)”, “english AU(en-AU)”, “english IN(en-IN)”, “french(fr-FR)”, “italian(it-IT)”, “japanese(ja-JP)” etc.
Speech-to-text is empowered by Google Cloud Chirp framework trained over 2b+ parameters and 100 native languages.

Speech-to-text on Google Cloud Vertex AI with Gen AI Studio

As you click on “View code”, you can check the sample Python code generated explicitly for “text-to-speech” and “speech-to-text” model.

"""Synthesizes speech from the input string of text."""
from google.cloud import texttospeech
client = texttospeech.TextToSpeechClient()
input_text = texttospeech.SynthesisInput(text="You can think of a VPC network the same way you'd think of a physical network, except that it is virtualized within Google Cloud. A VPC network is a global resource that consists of a list of regions.")
# Note: the voice can also be specified by name.
# Names of voices can be retrieved with client.list_voices().
voice = texttospeech.VoiceSelectionParams(
language_code="en-US",
name="en-US-Studio-O",
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.LINEAR16,
speaking_rate=1
)
response = client.synthesize_speech(
request={"input": input_text, "voice": voice, "audio_config": audio_config}
)
# The response's audio_content is binary.
with open("output.mp3", "wb") as out:
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')

Text-to-speech Vertex AI model Python script generated on Gen AI Studio

# Happy Transformation with GenAI

A lap around Generative AI Studio of Google Cloud

Written by Cloud Journeys with Anindita