24. Confidential RAG (and some other AI use cases) on GCP

While I worked at Google during the publishing of this post / video, the views expressed here are my own and may not reflect those of my employer. Only publicly available material is used to put together the content of this article and video. The website and the Youtube videos are NOT monetized.

๐Ÿ’ก For most customers, it is recommended that they use Google’s Vertex AI capabilities to implement Retrieval Augmented Generation (RAG) and other AI use cases. The approach from this post is recommended when the customers have very restrictive data protection mandates or compliances. These customers generally fall in category of sovereign customers (public sector, critical national infrastructure, customers restricted to their own models etc.).

You can directly scroll down to the Demo Video. And since the video is extremely detailed one, I am keeping the blog post very brief.

AI is no longer a novelty but a necessity today, every organization is harnessing its power. AI is transforming businesses, streamlining operations, and driving innovation.

Among the myriad of AI applications, Natural Language Processing (NLP), and more specifically, the Retrieval-Augmented Generation (RAG - explained in the video below), is emerging as one of the dominant use cases.

However, for public sector entities and critical infrastructure providers, data protection concerns may arise when using public AI APIs. Some of them might also be restricted to use their own (or open source) AI models.

The solution? Running their own RAG and other AI use cases on confidential Virtual Machines (VMs) powered by Google Cloud and Google Cloud’s Ubiquitious Data Encryption (UDE).

In short, the confidential compute VMs ensure encryption of data in use viz. have their RAM encrypted and UDE assures that your data lands and processes, “end to end” encrypted on a confidential VM only.

This approach offers the following assurances

  1. Customer controlled external encryption at rest (for training data, model weights).
  2. Encryption in transit (Inference queries and responses)
  3. Encryption during processing (RAM loaded model and inference operations)
  4. The workload is running on a securely booted confidential VM with encrypted RAM and the training data is delivered and processed only on a confidential VM.

This approach ensures data privacy, allowing these organizations to reap the benefits of AI without compromising on various data protection compliances and regulations even when they are overly restrictive.

Such customers can also use this approach for other AI use cases like

  1. Fine tuning a model using proprietary/ patented data to add behaviors
  2. Setting up confidential AI/ML pipelines for both NLP and NON NLP tasks (One such use case is also demonstrated here)
  3. Fine tuning their own model on proprietary data and running inferences

All right ๐ŸคŸ! the following video -

  1. Explains RAG
  2. Demos confidential RAG
  3. Explains and demos other confidential AI use case implementations (e.g. confidential speech to text ML pipeline)

Video Demo

Please watch in full screen or on Youtube directly

For a more detailed explanation of Google Cloud’s UDE capability you can also refer to this post. And this post can help you with the setup.


While most customers can (and should) actually use Google’s Vertex AI capabilities, customers who exhibit an elevated concern for data privacy and confidentiality - can use Google’s confidential compute with UDE to implement AI use cases that otherwise cannot be implemented using API based AI offerings.

Thank you for reading through, Please like ๐Ÿ‘, share ๐Ÿ”— and comment โœ if you found it useful.


Further reading / listening

comments powered by Disqus