Google’s BigQuery Introduces Column-Level Encryption Functions and Dynamic Information Masking

Google recently released new features for its BigQuery SaaS data warehouse that include column-level encryption capabilities and dynamic information masking. These features add a second layer of defense in addition to access control to help secure and manage sensitive data.

Specifically, dynamic information masking can be used for real-time transactions, while column-level encryption provides additional security for data at rest or in motion when real-time usability is not required.

These new features could be useful for companies that store personally identifiable information (PII) and other sensitive data such as credit card data and biometric information. Companies that store and analyze data in countries with evolving data regulations and privacy mandates face ongoing risks of data breaches and data leaks and need to control access to data, and these businesses can also benefit from the new features.

Advertising

Column-level encryption allows encryption and decryption of column-level information, which means that the administrator can select which column is encrypted and which is not. It supports AES-GCM (non-deterministic) and AES-SIV (deterministic) encryption algorithms. The functions support AES-SIV to allow grouping, aggregation, and joins on encrypted data. This new feature enables new use cases: when data is natively encrypted in BigQuery and must be decrypted upon access, or when data is externally encrypted, stored in BigQuery, and then decrypted upon access.

Column-level encryption is integrated with the Cloud Key Management System (Cloud KMS) to provide the administrator with more control, to enable management of encryption keys within KMS, and to enable secure key recovery at the access as well as detailed logging. Cloud KMS can be used to generate the KEK (Key Encryption Key) that encrypts the DEK (Data Encryption Key) that encrypts data in BigQuery columns. Cloud KMS uses IAM (identity and access management) to define roles and permissions. KEK is a symmetric encryption key ring stored in Cloud KMS, and referencing an encrypted key ring in BigQuery reduces the risk of key exposure.

The BigQuery documentation explains:

When executing the request, you provide the KmS Cloud resource path of the KEK and the encapsulated DEK’s ciphertext. BigQuery calls Cloud KMS to unpack the DEK, then uses that key to decrypt your query data. The unwrapped version of the dek is only stored in memory for the duration of the request and then destroyed.

In an example use case, the postal code is the data to be encrypted and a non-deterministic function decrypts the data when accessed using the query function running against the table.

BigQuery Documentation

In a second example, the deterministic function AEAD can decrypt the data when accessed using the query function running on the table and supports aggregation and joins using encrypted data.

1figure 2 1657100682622

BigQuery Documentation

This way, even a user who is not authorized to access the encrypted data can perform a join.

Prior to the release of the column-level encryption feature, administrators should make copies of datasets with obfuscated data to manage appropriate group access. This creates an inconsistent approach to data protection, which can be costly to manage. Column-level encryption increases the level of security because each column can have its own encryption key instead of a single key for the entire database. Using column-level encryption allows faster access to data because there is less encryption data.

Dynamic information masking, released in preview, allows administrators more control who can choose, combined with column-level access control, to grant full access, no data access, or masked data extending security at the column level. This feature selectively masks data at the column level at query time based on masking rules, user roles, and defined privileges. This feature allows administrators to obfuscate sensitive data and control user access while mitigating the risk of data leakage.

With this new feature, data sharing is easier because administrators can selectively hide information and tables can be shared with large groups of users. At the application level, developers do not need to modify the query to hide sensitive data, once data masking is configured to Big At the query level, the existing query automatically hides data based on roles granted to the user. Finally, applying security is easier because the administrator can write the security rule once and then apply it to any number of columns with tags.

Any masking policies or encryption applied on the underlying tables are carried over to authorized views and materialized views, and masking or encryption is compatible with other security features such as row-level security.

Both new features can be used to tighten security, manage access control, comply with privacy law, and create secure testing environments. Allowing for a more consistent way of managing tables with sensitive data, administrators don’t need to create multiple datasets with encrypted (or unencrypted) data and share those copies with the right users.

Leave a Comment