In Java 8, Lambda expressions were introduced, which is Java’s first step towards support for functional programming. Functional programming is declarative in style, compared to the imperative approach that procedural language use. In declarative languages, tasks are executed using functions that do not depend on local or global state This is unlike imperative languages where the end result can be influenced by the states. These two approaches represent different philosophies to software engineering with each having their respective benefits and drawbacks.
Now that Java provides the flexibility, a good design will use either declarative or imperative approach based on which is more appropriate. A common functionality in the Okera code base is the practice of making a status check configured on a timer. This would involve looping continuously until the operation has a successful status or the timeout is reached in the timer. There were multiple uses of this practice each with their own customized method of solving this, which leads to repeated code. The introduction of Lambda functions enabled us to create a utility that takes in a lambda function and a timeout configuration. This utility would then be used by any developer seeking to making a status check.
Before getting started with the utility, we needed to create an interface that is similar to the Callable interface. This Callable interface “Computes a result, or throws an exception if unable to do so.” However, we want to have an exception type if the result is unable to be calculated but also if the timer has run out. The practice of having clearly defined exceptions allows the user of the utility to manage the exception in the way that’s easier for them. So, we created a new interface called a “RetrierCallable” that is defined as the following:
public static interface RetrierCallable { V call() throws IOException, NonRetryableException; }
This RetrierCallable interface is defined to throw specific exceptions to make it easier for the developer using the utility to handle them. An IOException is invoked whenever a maximum timeout/number of attempts is reached. A NonRetryableException is invoked whenever there is an external issue. A use case to differentiate these two exceptions could be if a user is attempting to see if a file has been uploaded to a bucket in AWS S3. An IOException can be thrown if the path is fully accessible and the file just hasn’t been uploaded yet. A NonRetryableException could be used if the network has been disconnected and the bucket is unreachable.
Our utility, which we appropriately named the “Retrier”, takes in a RetrierCallable task that is to be executed. Alongside, we also have overloaded functions present which input a maxAttempts and timeoutMs value. We run the given function maxAttempts number of times or for timeoutMs long and quit retrying once one is reached. Here is the original “run” function definition:
public static T run(RetrierCallable task, int maxAttempts, long timeoutMs) throws IOException, NonRetryableException We begin by doing some base-case sanity checks: public static T run(RetrierCallable task, int maxAttempts, long timeoutMs) throws IOException, NonRetryableException { if (timeoutMs < 0) { throw new IllegalArgumentException("invalid timeout value, must be nonnegative"); } if (maxAttempts < 0) { throw new IllegalArgumentException("invalid maxAttempts value, " + "must be nonnegative"); } int attempts = 0; long startTimeMs = System.currentTimeMillis(); Throwable t = null; }
We then use a do-while clause for the core structure of this function.
do { //logic will come in here } while (System.currentTimeMillis() - startTimeMs < timeoutMs);
This allows us to run the task at least once in case the timeout was larger than zero but still very small. The magic of this utility is inside the try-catch block that is placed inside the do clause from earlier. Here is the function with the try-catch block:
public static T run(RetrierCallable task, int maxAttempts, long timeoutMs) throws IOException, NonRetryableException { if (timeoutMs < 0) { throw new IllegalArgumentException("invalid timeout value, must be nonnegative"); } if (maxAttempts < 0) { throw new IllegalArgumentException("invalid maxAttempts value, " + "must be nonnegative"); } int attempts = 0; long startTimeMs = System.currentTimeMillis(); Throwable t = null; do { try { return task.call(); } catch (IOException e) { if (t == null) { t = e; } attempts += 1; if (attempts > maxAttempts) { throw new IOException("Request failed after " + maxAttempts + " attempts. " + "The first error was " + t, t); } try { LOGGER.warn("Sleeping " + DEFAULT_BACKOFF_TIME_MS + "ms before trying attempt " + (attempts + 1)); Thread.sleep(DEFAULT_BACKOFF_TIME_MS); } catch (InterruptedException threadException) { throw new NonRetryableException(threadException); } } } while (System.currentTimeMillis() - startTimeMs < timeoutMs); throw new IOException("Request failed after reaching the max timeout of " + timeoutMs + "ms. Total attempts took " + (System.currentTimeMillis() - startTimeMs) + ". The first error was " + t, t); }
We first begin by executing “task.call()”. As described above, one of three things can happen. The first scenario being the task was run and threw a NonRetryableException. In this case, the function returns that exception directly to the caller as we do not do manual handling. The second is call was successful and we are simply returning the value that we expect to receive. Finally, if an IOException was thrown, the retrying logic comes to action. We first save the stack trace of the error and return it only if the maximum number of attempts has been reached. This stack trace will allow the user of the utility to more easily debug the underlying issue. If the maximum number of times is not reached, we attempt to sleep for DEFAULT_BACKOFF_TIME_MS amount of time to avoid throttling the CPU.
As you can see, this use case of lambda expressions provides more clarity and structure for this utility, and the users of the utility. In the future, if we want to create a function that spawns a server , we place code inside a new call method inside RetrierCallable object. This call method will check to see if the server has spawned and appropriately throw an exception otherwise. Some examples of this usage is provided below.
The original:
/** * Returns the state of the server. */ public String getState() throws IOException { Response response = client.target(this.url).path(STATE_PATH).request().get(); checkError(response); return response.readEntity(String.class); }
Using Retrier.java:
/** * Returns the state of the server. */ public String getState() throws IOException, NonRetryableException { return Retrier.run(() -> { Response response = client.target(this.url).path(STATE_PATH).request().get(); checkError(response); return response.readEntity(String.class); }); }
The checkError() function for reference:
// Error codes above this indicate failure. private static final int HTTP_ERROR_CODE_START = 400; private void checkError(Response resp) throws IOException { if (resp.getStatus() >= HTTP_ERROR_CODE_START) { String msg = resp.readEntity(String.class); throw new IOException("Error code: " + resp.getStatus() + " Detail: " + msg); } }
Turning the logic of this function into a lambda expression and passing the new lambda function into the utility helped add our retrying functionality in a matter of two lines. Now, if the server is returning an error code greater than 400, we retry the getState() call, as is a general rule in distributed systems. If you have any feedback, please reach out to me at swapnil@okera.com!