If you’re thinking of building your own custom access control solution, read this first.
For most large enterprises not in the business of information technology (IT), the question of whether to build or buy a piece of operational software is largely not a reasonable one at all. IT departments in large corporations are mostly service providers who receive requests from internal customers, usually through a system like Service Now that provides workflow automation. Specialists maintain the various systems, such as databases, web, application or email servers, storage systems, etc. Developers in these enterprises are also focused on solving business problems by implementing internal or customer-facing services.
Looking back at the last few decades, you see an interesting trend – due to the commoditization of IT infrastructure, companies started to consider moving further down into their tech stacks. Instead of buying a piece of software that could do 70% of what they needed it to do, they formed a team of exceptional engineers to build the 95% solution (arguably, it’s impossible to write the perfect software), which would save enough money and/or generate enough new revenue to justify the cost of building it.
So, then, why wouldn’t you want to build your own software instead of buying it?
A simplistic theory would say that, assuming the technically savvy builders have an edge on the competition, that lead will only continue to grow as they replace more technology with their own creations. However, this is not how it usually works in practice.
Development resources are finite; you can’t hire an unlimited amount of highly skilled engineers. And eventually, you’re spending more time maintaining the self-built systems than being able to create new ones. It’s the same reason that we buy cars from manufacturers with an established dealer network — for some extra cost, we can rely on the car quality and ongoing maintenance without paying for it ourselves each time.
In other words, the technical debt is not negligible.
A few software systems manage to overcome this problem by being released into the world as open-source so other companies can benefit from them, sharing the burden of contribution in the process. One of those examples is Hadoop, which in 2008 generated enough momentum in the business world for commercial companies to pick up the development and nurturing of the included open-source projects. These platform vendors proactively filled in the gaps, either by creating new open-source systems or by acquiring them from customers that saw an opportunity to join the elusive club of builders.
Access Control Woes
This brings us back to the issue of access control for data lakes, and two Apache projects: Sentry and Ranger. Alas, they had both their start before the end of the decade had enabled a more holistic approach to the problem. They tried what they could to provide some form of manageable multi-tenant access control but, arguably, they failed.
Before we look at this in more detail, let’s first define the challenges that need to be addressed: expressiveness of policies and consistency of enforcement. If these two dimensions stay fixed, they are solvable in many ways. But they have a strong tendency to grow over time, and in a manner that is really hard to control as a central IT team, because:
- Sophistication of policies is driven by other parts of the business, such as line-of-business units (LOBs) procuring data from other vendors, tracking new information about users, or storing privacy-related data.
- The set of tools that you want to support are always changing, as users demand the best-of-breed analytical and compute engines.
As the data complexity grows, so does the supported set of tools, which creates a combinatorial problem if your access control solution simply provides a central management tool to push tool-specific policies into the existing infrastructure. The result is that access policies are multiplying uncontrollably, making audits complex, cumbersome, and error-prone. The patchwork of permissions and policies combined with context-unaware enforcement layers wreak havoc, and the fallout is undeniable. Almost every day, it seems, there comes a report of another company, whether large or small, that left its data open for access by unauthorized users – at times, this amounts to the entire Internet!
Dynamic Policy Enforcement
If you know who the user is trying to access data, no matter how or through what service, a small set of manageable policies is sufficient even for sophisticated access patterns implemented across all supported systems. For advanced security features like differential privacy, a disconnected enforcement layer is unable to correlate access, and therefore cannot keep out a malicious user that is trying to circumvent security by alternating the various access points.
Inorder for this to work, you need a variety of enforcement layers that can optimize every path of data access for both security and performance. Ask any vendor in this space what their fallback solution is (Ranger, for example, has none) and what the impact of that fallback will be. If queries that used to run in a few minutes now take hours, your end users start complaining.
Build vs. Buy
A flexible, intelligent, and powerful access solution is not something to be trifled with. Yes, those tech-savvy companies willing to go the extra mile – names like Google, Facebook, Twitter, Lyft, and Tesla come to mind – might be able to solve this problem, but for others it would be all but impossible. Sentry and Ranger have proven that point beyond any doubt, failing to deliver scalable security that address the two dimensions of complexity; either the policies get out of control, or the sheer number of systems stifles innovation as an overwhelmed and poorly advised central IT department is chasing tails.
When looking at the cloud vendors flooding the market with ready-made SaaS solutions, you might wonder: if you cannot easily build it, why not rent instead? This is also not an acceptable solution, and for inherently the same reasons: Either the vendors believe what they have is enough, or they underestimate the scope of the task. These services also have a fatal flaw, as they aim squarely at the platform and infrastructure teams with their offerings, not the individual line-of-business units. The end result is that security is once more not controlled by the owners or users of the data, but by a group that is removed from them.
- Managing data and access control must be happening in the line-of-business units. Central IT is not well suited to do that job, if at all.
- Only a unified policy and metadata management solution, covering all data sources and consuming applications, can provide scalable access control.
- Building a data access solution requires not only the above management capabilities, but also optimizing data access for performance. Slowing down queries eventually turns from a nuisance to a disturbance.
- While some cutting-edge tech companies may be able to build a custom solution, it is a nearly impossible task for the majority of enterprises.
- Technical debt – that is, maintaining the bespoke system – is rarely viable.