The Hidden Costs of Cloud Data Lakes

This blog series from Cazena's engineering team investigates the hidden costs of cloud data lakes. Learn the top three hidden costs of cloud data lakes!

Read the Blog Series

The Hidden Costs of DIY Cloud Data Lakes #2: Identity Management and Authentication (Single Sign-On)

By John Piekos, Vice President of Engineering and Global Customer Support

This blog series, the Hidden Costs of Cloud Data Lakes, presents the various tasks and challenges you may be faced with if you decide to build a cloud data lake for analytics and machine learning yourself. Our first blog focused on Time to Analytics. Now we look at identity management, authentication and single sign-on (SSO).

Identity Management and Authentication (Single Sign-On) in Cloud Data Lakes

Our next topic for this engineering blog series is identity management and authentication for cloud data lakes and analytics/ML environments, specifically, single sign-on (SSO). SSO is a user authentication service that allows a user to log in once, and then, using those login credentials, access multiple applications. Establishing a single, consistent identity enables customers to access multiple applications in their Cazena Instant Cloud Data Lake, thus setting the foundation for a consistent and unified authorization and governance model. 

Within the Cloud Data Lake, many services require user authentication, including:    

  • Web user interfaces such as Hue  
  • SQL engine access via Impala or Trino  
  • Analytic engines such as Spark and Hive 
  • Any custom or third-party applications that you add   
  • Most importantly, the cloud object storage layer (S3 for AWS or ADLS for Azure) 

With SSO, users can access all of these services and data without being required to sign in to each one individually. (This assumes, of course, that a user was granted access to the service or data. Authorization management will be covered in a future post.) 

Federated SSO establishes trust across multiple organizations. This trust relationship allows credentials from your enterprise Identity Management Provider (IDM), such as Active Directory or LDAP, to be used with the Cloud Data Lake.

Federated SSO offers many benefits, including: 

  • Users only need to sign in once to have access to both enterprise resources and their Cloud Data Lake. 
  • Groups created in the enterprise can be used to control access to Cloud Data Lake resources, without the need to re-create those groups.  
  • Accounts and credentials are centrally managed. For example, if a user changes their password, the change is applied to both the enterprise and Cloud Data Lake. When an employee leaves a company, their credentials only have to be deleted once. 

Should you not have a corporate lDM, the Cazena Cloud Data Lake can provide its own user management and still allow single sign-on to all the services and applications contained within. 

Building your own SSO capabilities is one of those often-overlooked costs of deploying your own cloud data lake.  It may seem tempting to use the IDM that is offered by the cloud provider (e.g., AWS, Azure). However, that IDM will only work with the services offered by the cloud provider. Extending SSO to third-party applications will be something you will have to provide yourself. The SSO solution also needs the ability to federate corporate identity across the cloud data lake.  

Cazena’s Instant Cloud Data Lake comes with SSO enabled automatically for all the data lake services and applications within your Cloud Data Lake. Once deployed for your enterprise, it is a simple matter for Cazena to federate your corporate identity to your Cloud Data Lake.  With this trust established, your users are immediately ready to access services within the data lake. 



Explore other posts in this series:

Introduction: The Hidden Costs of Cloud Data Lakes

Hidden Cost #1: Time to Analytics




Subscribe & Stay Informed!

Related Resources