18. Sovereignty Part 2/4 - Encryption to power data sovereignty on Google Cloud (Demo GCP EKM + KAJ and Confidential computing)

While I worked at Google during the publishing of this post / video, the views expressed here are my own and may not reflect those of my employer. Only publicly available material is used to put together the content of this article and video. The website and the Youtube videos are NOT monetized.


💡 This is part 2 of the (4 part) sovereignty series. Please checkout Part 1 - What is sovereignty? before you continue. Also here are other parts -

  1. What is sovereignty and why it can be the next big trend in cloud computing?
  2. ▶ Encryption to power data sovereignty on Google Cloud (Demo GCP EKM + KAJ and Confidential computing)
  3. Practical sovereignty - Sovereign solutions on Public Cloud
  4. Setting up sovereignty demo on GCP - Google cloud EKM + Confidential Computing + Ubiquitous Data Encryption/UDE + Thales CKM

I will try to keep this article interesting, but if it overwhelms you 🥱 scroll down and check out the video demo first!

Since everyone is doing that now 🤗, I asked chatgpt. What’s data sovereignty?

Here’s what it (?) said!

alt

As impressive as that answer is, it is fairly complex. So let’s simplify it.

Data sovereignty for an organization is the ability to control where their data resides, who accesses it and protecting the data from access requests by local and foreign governments and legal entities. Data sovereignty controls are generally implemented in accordance with organization’s own and their customers' security and privacy policies and local laws.

Depending upon your familiarity with the subject even this definition might be complex. But rest of this article and especially the video demo should help.

🙋‍♂️ I manage an application on cloud, what it means to implement data sovereignty for it?

Let’s say you manage the following application for your organization

alt

Whether this application runs on premises or in cloud, you need to consider following aspects

  1. Type and classification of data
    1. Type - Application data, Customer data, Employee data, Static files etc.
    2. Classification - Controlled Unclassified Information, Restricted, Controlled and Public etc.
  2. Location of the data
    1. Logical - Data in RDBMS, NOSQL DBs, Disks and Object storage
    2. Physical - Actual location of the data rack, DC, city or if in cloud a region and a zone etc.
  3. Points at which the data is vulnerable for external (forced) access and exfiltration based on third party demands like government subpoenas
    1. Data at rest
    2. Data in transit
    3. Data in use

Providing / implementing data sovereignty means making sure that you implement / are able to implement controls that guarantee following

  1. Data access sovereignty by various means such as data residency, encryption and controlling data provenance
  2. Data access transparency by means like Access Transparency and explicit data access approvals

These two aspects are explained really well in this YouTube Video from Google cloud.

So, basically there are 4 pillars of data sovereignty when on cloud

  1. Data residency
    1. You are able to control where the data processed by your application resides
    2. You are able to control where the data processed by cloud provider services used by your application resides
    3. For cloud this means a particular region/ regions within an allowed boundary.
    4. This is not as simple as just choosing the region for your resources but a guarantee from the cloud provider that the services used by your application do not use any other region for their internal (short or long term) data storage - viz. data residency compliant services.
  2. Encryption
    1. Data is encrypted
      1. At rest
      2. In transit
      3. And in use (in RAM)
    2. You control the access to the encryption keys
    3. Again, if the encryption keys reside on cloud or they are controlled by the cloud provider - having encryption provides limited sovereignty. Solutions like Azure Managed HSMs and better yet GCP External Key Manager - EKM (more on that below) can take away that control from the cloud provider and hand it over to the customer.
  3. Explicit approval mechanisms for data access and data access transparency (audit logs) upon such access
    1. Access to your data by cloud provider should not be implicit/blanket/without your approval even to work on support cases
    2. Any such access must be logged with reason, personnel location etc. and you must have access to these logs reasonably quickly.
  4. Ability to exercise least amount of trust in cloud provider and use it just as a commodity.
    1. Maintining the data provenance, viz. you are required to bring only the required data to cloud e.g. - User identities residing with your current identity provider need not be synced with the cloud provider but you should just be able to use them by having your identity provider as single source of user data. A concrete implementation of this is Google Workforce identity federation.
    2. Traditionally the data in use is hardest to protect - viz. data in RAM. You should have an option to use trusted execution environments like confidential compute instances where the data in RAM is encrypted and is beyond the reach of the cloud provider.
    3. You can read about aiming for least amount of trust in cloud provider in this blog - The cloud trust paradox: To trust cloud computing more, you need the ability to trust it less

All right, Enough theory 📚 Let’s get ready for demo of a real world example.

Demo Data sovereignty on GCP with EKM + KAJ + UDE

In the demo we are going to focus on pillar number 2 (Encryption) and 4 (Trust less). And we are going to see how Google cloud uses these to provide data sovereignty mechanisms to their customers. We will tackle pillars 1 and 3 in part 3

alt

The diagram shows the setup layout of the demo. We have explained it in the video below. But before we go there let’s quickly look at the goal of the demo

💡 Part 4 of this sovereignty series is dedicated to setting up this demo.

Goal of the Demo

  1. Demonstrate hybrid / multi cloud computing setup with customer having their workloads (BigQuery datasets & tables, Compute engine VMs) on GCP and encryption keys and confidential data on premises (we have used AWS for the demo)
  2. BigQuery data is encrypted with an external key maintained in Thales Key Manager deployed in AWS
  3. Customer can demonstrate complete control of encryption and encryption keys
  4. Customer can restrict their workloads run only on a confidential computing instance. Thereby achieving ubiquitous data encryption. Viz.
    1. Encryption at rest (in DC and in GCP)
    2. Encryption in transit (SSL)
    3. Encryption at use (in RAM with confidential compute instances with remote attestation)

With the goals out of the way let’s quickly look at the GCP services we are going to use

GCP services used in the Demo

  1. Cloud External Key Manager (EKM) - is a Google service that allows customers to keep their encryption key material in their trusted environment (like on premises or other cloud provider). Customers can use these keys to encrypt/decrypt data in Google Cloud. Thereby retaining complete control over encryption. Read more here
  2. Key Access Justifications (KAJ) - Is a service that sends justification reasons whenever your external keys are used to encrypt/decrypt your data in Google cloud. Based on these justification reasons you can choose to allow usage of these keys (e.g. allow for customer workloads but reject for third party data requests). Read more here
  3. Ubiquitous Data Encryption (UDE) - This Google cloud capability enables customers to restrict their workloads using EKM encrypted data to run only on confidential compute instances. There by achieving completely trusted execution environment. STET (Split Trust Encryption Tool) is the utility google provides to achieve UDE via remote attestation. Read more here - UDE, STET

Video Demo

Please watch in full screen or on YouTube directly

Data sovereignty for productivity / collaboration cloud

On collaboration clouds like Google workspace, Office 365 - as the customer has no control over the storage and compute backend - providing data sovereignty should be part of the offering.

Following video from Google Workspace official YouTube channel provides a comprehensive demo of Client Side Encryption (CSE) in Google workspace.



Client Side Encryption (CSE) is also available for Gmail now

Conclusion

Data sovereignty gives customers complete control over their data on cloud. Google cloud provides various mechanisms that enable customers to implemented data sovereignty for their customers. Encryption and more specifically customer-controlled encryption forms the most important pillar of data sovereignty.

Thank you for reading through, Please like 👍, share 🔗 and comment ✍ if you found it useful.

-Nikhil

Further reading

comments powered by Disqus