Engineering blog

Restricting Libraries in JVM Compute Platforms

Learn how Databricks restricts third party libraries in JVM compute platforms, including Scala, Java, and others.
Share this post

Security challenges with Scala and Java libraries

Open source communities have built incredibly useful libraries. They simplify many common development scenarios. Through our open-source projects like Apache Spark, we have learned the challenges of both building projects for everyone and ensuring they work securely. Databricks products benefit from third party libraries and use them to extend existing functionalities. This blog post explores the challenges of using such third party libraries in the Scala and Java languages and proposes solutions to isolate them when needed.

Third-party libraries often provide a wide variety of features. Developers might not be aware of the complexity behind a particular functionality, or know how to disable feature sets easily. In this context, attackers can often leverage unexpected features to gain access to or steal information from a system. For example, a JSON library might use custom tags as a means to inappropriately allow inspecting the contents of local files. Along the same lines, a HTTP library might not think about the risk of local network access or only provide partial restrictions for certain cloud providers.

The security of a third party package goes beyond the code. Open source projects rely on the security of their infrastructure and dependencies. For example, Python and PHP packages were recently compromised to steal AWS keys. Log4j also highlighted the web of dependencies exploited during security vulnerabilities.

Isolation is often a useful tool to mitigate attacks in this area. Note that isolation can help enhance security for defense-in-depth but it is not a replacement for security patching and open-source contributions.

Proposed solution

The Databricks security team aims to make secure development simple and straightforward by default. As part of this effort, the team built an isolation framework and integrated it with multiple third party packages. This section explains how it was designed and shares a small part of the implementation. Interested readers can find code samples in this notebook.

Per-thread Java SecurityManager

The Java SecurityManager allows an application to restrict access to resources or privileges through callbacks in the Java source code. It was originally designed to restrict Java applets in the Java 1.0 version. The open-source community uses it for security monitoring, isolation and diagnostics.

The SecurityManager policies apply globally for the entire application. For third party restrictions, we want security policies to apply only for specific code. Our proposed solution attaches a policy to a specific thread and manages the SecurityManager separately.


/**
 * Main object for restricting code.
 *
 * Please refer to the blog post for more details.
 */
object SecurityRestriction {
  private val lock = new ReentrantLock
  private var curManager: Option[ThreadManager] = None

...

  /**
   * Apply security restrictions for the current thread.
   * Must be followed by [[SecurityRestriction.unrestrict]].
   *
...

   *
   * @param handler SecurityPolicy applied, default to block all.
   */
  def restrict(handler: SecurityPolicy = new SecurityPolicy(Action.Block)): Unit = {
    // Using a null handler here means no restrictions apply,
    // simplifying configuration opt-in / opt-out.
    if (handler == null) {
      return
    }

    lock.lock()
    try {
      // Check or create a thread manager.
      val manager = curManager.getOrElse(new ThreadManager)
      
      // If a security policy already exists, raise an exception.
      val thread = Thread.currentThread
      if (manager.threadMap.contains(thread)) {
        throw new ExistingSecurityManagerException
      }
      
      // Keep the security policy for this thread.
      manager.threadMap.put(thread, new ThreadContext(handler))
      
      // Set the SecurityManager if that's the first entry.
      if (curManager.isEmpty) {
        curManager = Some(manager)
        System.setSecurityManager(manager)
      }
    } finally {
      lock.unlock()
    }

  }

...

}

Figure 1. Per-thread SecurityManager implementation.

 

Constantly changing the SecurityManager can introduce race conditions. The proposed solution uses reentrant locks to manage setting and removing the SecurityManager. If multiple parts of the code need to change the SecurityManager, it is safer to set the SecurityManager once and never remove it.

The code also respects any pre-installed SecurityManager by forwarding calls that are allowed.


/**
 * Extends the [[java.lang.SecurityManager]] to work only on designated threads.
 *
 * The Java SecurityManager allows defining a security policy for an application.
 * You can prevent access to the network, reading or writing files, executing processes
 * or more. The security policy applies throughout the application.
 *
 * This class attaches security policies to designated threads. Security policies can
 * be crafted for any specific part of the code.
 *
 * If the caller clears the security check, we forward the call to the existing SecurityManager.
 */
class ThreadManager extends SecurityManager {
  // Weak reference to thread and security manager.
  private[security] val threadMap = new WeakHashMap[Thread, ThreadContext]
  private[security] val subManager: SecurityManager = System.getSecurityManager

...

  private def forward[T](fun: (SecurityManager) => T, default: T = ()): T = {
    if (subManager != null) {
      return fun(subManager)
    }
    return default
  }

...

  // Identify the right restriction manager to delegate check and prevent reentrancy.
  // If no restriction applies, default to forwarding.
  private def delegate(fun: (SecurityManager) => Unit) {
    val ctx = threadMap.getOrElse(Thread.currentThread(), null)

    // Discard if no thread context exists or if we are already
    // processing a SecurityManager call.
    if (ctx == null || ctx.entered) {
      return
    }

    ctx.entered = true
    try {
      fun(ctx.restrictions)
    } finally {
      ctx.entered = false
    }

    // Forward to existing SecurityManager if available.
    forward(fun)
  }

...

// SecurityManager calls this function on process execution.
override def checkExec(cmd: String): Unit = delegate(_.checkExec(cmd))

...

}

Figure 2. Forwarding calls to existing SecurityManager.

Security policy and rule system

The security policy engine decides if a specific security access is allowed. To ease usage of the engine, accesses are organized into different types. These types of accesses are called PolicyCheck and look like the following:


/**
 * Generic representation of security checkpoints.
 * Each rule defined as part of the [[SecurityPolicy]] and/or [[PolicyRuleSet]] are attached
 * to a policy check.
 */
object PolicyCheck extends Enumeration {
  type Check = Value

  val AccessThread, ExecuteProcess, LoadLibrary, ReadFile, WriteFile, DeleteFile = Value
}

Figure 3. Policy access types.

For brevity, network access, system properties, and other properties are elided from the example.

The security policy engine allows attaching a ruleset to each access check. Each rule in the set is attached to a possible action. If the rule matches, the action is taken. The code uses three types of rules: Caller, Caller regex and default. Caller rules look at the thread call stack for a known function name. The default configuration always matches. If no rule matches, the security policy engine defaults to a global action.


/**
 * Action taken during a security check.
 * [[Action.Allow]] stops any check and just continues execution.
 * [[Action.Block]] throws an AccessControlException with details on the security check.
 * Log variants help debugging and testing rules.
 */
object Action extends Enumeration {
  type Action = Value

  val Allow, Block, BlockLog, BlockLogCallstack, Log, LogCallstack = Value
}

...

// List of rules applied in order to decide to allow or block a security check.
class PolicyRuleSet {
  private val queue = new Queue[Rule]()

  /**
   * Allow or block if a caller is in the security check call stack.
   *
   * @param action Allow or Block on match.
   * @param caller Fully qualified name for the function.
   */
  def addCaller(action: Action.Value, caller: String): Unit = {
    queue += PolicyRuleCaller(action, caller)
  }

  /**
   * Allow or block if a regex matches in the security check call stack.
   *
   * @param action Allow or Block on match.
   * @param caller Regular expression checked against each entry in the call stack.
   */
  def addCaller(action: Action.Value, caller: Regex): Unit = {
    queue += PolicyRuleCallerRegex(action, caller)
  }

  /**
   * Allow or block if a regex matches in the security check call stack.
   * Java version.
   *
   * @param action Allow or Block on match.
   * @param caller Regular expression checked against each entry in the call stack.

   */
  def addCaller(action: Action.Value, caller: java.util.regex.Pattern): Unit = {
    addCaller(action, caller.pattern().r)
  }

  /**
   * Add an action that always matches.
   *
   * @param action Allow or Block by default.
   */
  def addDefault(action: Action.Value): Unit = {
    queue += PolicyRuleDefault(action)
  }

  private[security] def validate(check: PolicyCheck.Value): Unit = queue.foreach(_.validate(check))

  private[security] def decide(currentStack: Seq[String], context: Any): Option[Action.Value] = {
    queue.foreach { _.decide(currentStack, context).map { x => return Some(x) }}
    None
  }

  private[security] def isEmpty(): Boolean = queue.isEmpty
}

...

/**
 * SecurityPolicy describes the rules for security checks in a restricted context.
 */
class SecurityPolicy(val default: Action.Value) extends SecurityManager {
  val rules = new HashMap[PolicyCheck.Value, PolicyRuleSet]

...

  protected def decide(check: PolicyCheck.Value, details: String, context: Any = null) = {
    var selectedDefault = default
    
    // Fetch any rules attached for this specific check.
    val rulesEntry = rules.getOrElse(check, null)
    if (rulesEntry != null && !rulesEntry.isEmpty) {
      val currentStack = Thread.currentThread.getStackTrace().toSeq.map(
        s => s.getClassName + "." + s.getMethodName
      )
      
      // Delegate to the rule to decide the action to take.
      rulesEntry.decide(currentStack, context) match {
        case Some(action) => selectedDefault = action
        case None =>
      }
    }
    
    // Apply the action decided or the default.
    selectedDefault match {
      case Action.BlockLogCallstack =>
        val callStack = formatCallStack
        logDebug(s"SecurityManager(Block): $details -- callstack: $callStack")
        throw new AccessControlException(details)
      case Action.BlockLog =>
        logDebug(s"SecurityManager(Block): $details")
        throw new AccessControlException(details)
      case Action.Block => throw new AccessControlException(details)
      case Action.Log => logDebug(s"SecurityManager(Log): $details")
      case Action.LogCallstack =>
        val callStack = formatCallStack
        logDebug(s"SecurityManager(Log): $details -- callstack: $callStack")
      case Action.Allow => ()
    }
  }

...

}

Figure 4. Basic for the Policy engine to filter SecurityManager calls.

This engine represents basic building blocks for creating more complicated policies suited to your usage. It supports adding additional rules specific to a new type of access check to filter paths, network IPs or others.

Example of restrictions

This is a simple security policy to block creation of processes and allow anything else.


import scala.sys.process._
import com.databricks.security._

def executeProcess() = {
  "ls /".!!
}

// Can create processes by default.
executeProcess

// Prevent process execution for specific code
val policy = new SecurityPolicy(Action.Allow)
policy.addRule(PolicyCheck.ExecuteProcess, Action.Block)

SecurityRestriction.restrictBlock(policy) {
  println("Blocked process creation:")
  
  // Exception raised on this call
  executeProcess
}

Figure 5. Example to block process creation.

Here we leverage the rule system to block file read access only to a specific function.


import scala.sys.process._
import com.databricks.security._
import scala.io.Source

def readFile(): String = Source.fromFile("/etc/hosts").toSeq.mkString("\n")

// Can read files by default.
readFile

// Blocked specifically for executeProcess function based on regex.
var rules = new PolicyRuleSet
rules.addCaller(Action.Block, raw".*\.readFile".r)

// Prevent process execution for a specific function.
val policy = new SecurityPolicy(Action.Allow)
policy.addRule(PolicyCheck.ReadFile, rules)

SecurityRestriction.restrictBlock(policy) {  
  println("Blocked reading file:")
  readFile
}

Figure 6. Example to block access to a file based on regex.

Here we log the process created by the restricted code.


import scala.sys.process._
import com.databricks.security._

// Only log with call stack
val policy = new SecurityPolicy(Action.Allow)
policy.addRule(PolicyCheck.ExecuteProcess, Action.LogCallstack)

SecurityRestriction.restrictBlock(policy) {
  // Log creation of process with call stack
  println("whoami.!!")
}

Figure 7. Example to log process creation including callstack.

JDK17 to deprecate Java SecurityManager and future alternatives

The Java team decided to deprecate the SecurityManager in JDK17 and eventually consider removing it. This change will affect the proposal in this blog post. The Java team has multiple projects to support previous usage of the SecurityManager but none so far that will allow similar isolation primitives.

The most viable alternative approach is to inject code in Java core functions using a Java agent. The result is similar to the current SecurityManager. The challenge is ensuring accurate coverage for common primitives like file or network access. The first implementation can start with existing SecurityManager callbacks but requires significant testing investments to reduce chances of regression.

Another alternative approach is to use operating system sandboxing primitives for similar results. For example, on Linux we can use namespaces and seccomp-bpf to limit resource access. However, this approach requires significant changes in existing applications and may impact performance.

Try Databricks for free
See all Tutorials posts