Photo by ValdasMiskinis

Introduction to Docker

08 May 2020

Docker in the last few years has seriously taken the Software Dev industry by storm and for good reason. This ever popular technology has brought a highly effective and simple way of deploying and sharing applications through the use of Containerisation. The technology is now very much mature and used throughout both small and large organisations and an integral part of build pipelines and Dev Ops engineering. Here I cover a quick introduction, raise the advantages of Docker and why you may want to consider using it if not already.

What is Docker?

Docker reduces any concern for environment, dependencies, configuration and runtime by packaging your application into a single deployable unit of software. The idea behind this means that this deployable unit can be ran from almost anywhere, whether it is your local machine, a QA or Production environment or your friendly co-worker’s machine. Portability is easily one of Docker’s biggest advantages and the ability to run your application on any one machine is a very powerful use case for Docker.

Is this different from a Virtual Machine?

The regular approach to hosting applications is to provision a Virtual Machine (VM), choose an OS, install the software and libraries required on your machine for the setup and then let your service run on top of your host. Historically this has worked fine but it turns out VMs are not all that efficient. VMs take a long time to boot up, they consume a considerable amount of space when accounting for the Guest Operating System and actually when you think about it many of them will have duplicate OS kernels while running on the same machine. The size or footprint of a Virtual Machine is also huge and sharing images is not particularly easy.

Containerisation by comparison and more specifically Docker containers provide an alternative to a VM, by acting as an extra abstraction over the host’s Operating System. A typical VM requires a Hypervisor (either physical/ “bare metal” or non-physical) to segregate the filesystem and system resources and then a Guest OS on top. Docker removed the need for a Hypervisor and a Guest OS and instead Docker runs in the background as a Daemon process on your host OS. This process can then instantiate containers (Docker’s equivalent to Virtual Machines) by forking new processes that are booted up from pre-defined images.

So how does Docker do this? A Docker Image is Docker’s way of bundling together the necessary libraries/dependencies, runtime environment, any environmental configuration and application code into one unit. This is similar to VM images but images in Docker are much more lightweight as they don’t require the entire Operating System. A Docker Image provides a simple yet very portable way to package your software that can be deployed to virtually anywhere. Provided Docker is installed that host can run your app. Separately to images, you can think of containers as processes or images at runtime. You can run any number of containers with different images and each will have their separate filesystems while sharing the same kernel. This can only be achieved however if each of the containers are based from the same base kernel. For example Windows Docker containers cannot be ran alongside Unix Docker containers. Compared to VMs this still similarly provides self containment while removing the overheard of a Hypervisor to manage VMs and the requirement of a large separate OS on each VM.

Key Concepts and Design

Docker has a few key concepts that are worth documenting for reference. You can expect to use the CLI as the primary method for interacting with Docker as it adopts a server-client model with a REST API.

For more background reading into how Docker works lookup cgroups and containerd. cgroups or Control Groups are a means of organising and grouping processes in Linux, addressing the need to segregate resources and share kernel spaces. containerd is the Open Source project for Docker Engine which was published not too long after it’s inception.

Docker Images - A read-only snapshot of a system, it's dependencies, libraries, software and configuration
Docker Container - A running instance of a Docker Image with a dedicated rw space
Docker Hub - A public cloud registry provided by Docker for publishing and sharing images
Docker Daemon - The background process used by the Docker server to create and maintain images and containers
Docker Client/CLI - The command line interface tool used by the client to interact with Docker Server

Key Advantages

1. Portability: This is a huge benefit. Being able to run your app independently of any environment, your local machine or any server
2. Consistency: You can be more confident that if your application runs in one environment it sholid do so on another with little or no intervention. This supports the idea of automation in Continuous Integration development processes
3. Sharing: No longer do you need to consider what version of the JDK or Python is being used, whether the web server is installed. Just create a Docker Image and distribute the image to other developers and environments. This is made even easier by Docker Hub, the cloud registry to plil and push Docker Images.
4. Efficiency and Performance: Containers share the same OS kernel from the host OS, therefore reducing running costs on capacity needs and CPU cycles if you are running software hypervisor.
5. Scalable: Due to the portability factor running exactly the same Docker Image across mlitiple servers is easy. Booting up and bringing down containers is fast and many modern container services like AWS ECS support automatic scaling
6. Separation: Much like VMs, Containers still reserve separate file system spaces and run as separate processes

Photo by Ponce Photography

Java 8

30 Sep 2019

At the time of writing, version 13 of the Java JDK is currently available and at the current rate, 14 won’t be too far away. If like me you spend time between multiple languages it can be hard to keep track of the useful (and not so useful) features of a language. Java 8 was perhaps the most profound update amongst all of these, offering many feature rich enhancements that Java programmers now take for granted.

This article aims to pin down, consolidate the most important features from Java 8 as a basis for moving onto other more recent offerings in Java.

Functional Interfaces

Functional interfaces are interfaces which contain only one method. They support the use of lambdas in Java which in turn can help developers to write code with more functional paradigms.

Functional interfaces can be imposed at compile time by the @FunctionalInterface annotation on declaring an interface, however it is not essential to be able to define them.

@FunctionalInterface
interface MyFirstFunctionalInterface {
    public void iHadJustOneJob();

    default void myDefaultFunc() {
        System.out.println("Default func");
    }
}

While I am not a fan of some of the extra features provided, Java 8 also brings the default keyword which allows you to define a default implementation for a method and we are now also allowed to create static methods as part of the interface. The first of these effectively enables multiple class inheritance through interfaces. In my view, default implementations should be left to abstract classes or regular base level classes for cleaner intent but it’s useful feature to know of. It should be noted that default functions do not count towards the one function limit either in a Functional Interface.

Lambdas

Perhaps the most significant addition to Java 8 is the introduction of lambda functions. If you unfamiliar with lambdas, put simply lambdas are functions without a name. Similar to anonymous functions they are defined inline and often defined and passed directly into other functions as parameters. This very feature gives way to functional paradigms and designs not previously available in native Java. Functions no longer need to be tied to objects and some of the many benefits of functional programming follow this, including the ability to run functions lazily (on-demand) or being able to treat functions as first class citizens rather than objects.

A function argument that takes a Functional Interface or a single function Interface can use lambdas in place of anonymous functions or object instances. The Java Runnable is a good example of this, as you can see a Runnable could be defined simply with an inline lambda:

// anonymous function
new Thread(new Runnable() {
    @Override
    public void run() {
        System.out.println("My old anonymous function");
    }
});
// lambda - Runnable has one method and can be considered as a functional interface
new Thread(() -> { System.out.println("My nicer looking lambda function."); });

The empty brackets define the arguments to my function, which in this case is empty as the only function in Runnable takes none. The type of these arguments is implied based on the Functional Interface and therefore only the parameter names are required. As this is a single line statement you can also remove the need for curly braces entirely.

If I wanted to, in some cases I can make the method static so I can re-use this in other places with the function reference notation. By using the class name followed by two colons this will enable me to take the static create function as a reference. In this case it’s not all that useful but you can imagine some use cases where it could be.

public class DefaultRunnableFactory {
    public static void create() {
        System.out.println("My Default Runnable");
    }
}

new Thread(DefaultRunnableFactory::create);

Method references or the double colon notationare often used a shorthand when using streams which are described in the next section.

Streams

A stream is a new type that enables developers to transform existing data structures into new projections while leaving the underlying data untouched. This is performed through a pipeline of data transformations on the original data source. These are called intermediate operations and each transformation will return a new stream for any subsequent operation. Intermediate operations are lazily evaluated and will often reduce streams to smaller ones with filter, map and removal functions. This can be very handy for either writing shorthands or when you need to break down large data sets. Essentially the idea behind streams is to compute only the values we need through a series of functions. Once we are done with the stream we can return the resultant with a terminal operator such as findFirst, collect, toArray, match or many others that are available as part of the Stream API.

There are multiple ways to create streams. For collections, the easiest way to is to simply call stream() on the collection. There are many different computations and terminations you can perform that you will be able to find here. This page is simply aimed as an overview of the feature. The Java API mentions many ways to create specific types of streams, but mostly you will be using Stream.of(…) or stream() on existing collections.

Here in this example code, we take a list of available users as a stream by invoking stream() before running a filter on all non-null users that contain the string “jack”. With this result set we then re-format the username by surname first, forename last using the map function. Lastly we then filter out any invalid usernames based on the previous map function and then use the ordered function to sort users alphabetically.

List<String> users = new ArrayList<>();
users.add("jack,smith");
users.add("jack,smith");
users.add("jake,smith");
users.stream()
    .filter(u -> u!=null && u.contains("jack"))
    .map(u -> {
        String[] names = u.split(',');
        if (u.length<2)
            return "";
        return String.format("%s, %s", u[1], u[0]);
    )
    .filter(String::isEmpty)
    .ordered()
    .collect(Collectors.toList());

There are ways to parallelise streams to make use of the default Fork-Join thread pool in Java. To do this, you can call parallelStream() on collections or parallel() on existing streams before executing other intermediary functions. This does however potentially clog the thread pool across your application from doing more useful work. My recommendation is to avoid parallel streams and instead submit lambdas against separate custom thread pools to avoid unnecessary contention.

forEach operator and new collection operators

The iterable interface now has a forEach function which you can use in place of iterating over collections. All is needed is to pass a single argument Consumer. This can be very convenient and clean compared to writing multiple for statements.

List<String> names = new ArrayList<String>();
names.add("Mark");
names.add("Dan");
names.forEach(name -> System.out.println(name));

In addition to the forEach operator, some useful operators have been added to the collection interface and map interface. A few that could be of interest include removeIf and removeAll in collection, and compute in the map interface.

Optionals

An Optional is described in the Java Docs as “A container object which may or may not contain a non-null value”. They act as wrappers around objects where we are unsure of the presence of a value. They can be useful for handling functions that may return null, a sample use-case could be when fetching data through a REST call or an IO operation. Optionals provide an alternative to traditional exception handling and null checks, hoping to alleviate some of the user error from handling NullPointerExceptions.

An Optional can be defined by Optional.of() or Optional.ofNullable(). Using the ifPresent function we can execute a function that is conditional on the presence of a value or if we suspect there is none then the orElseGet/orElse functions can return a default value. ifPresent will take a Consumer as an argument and if the Optional has a non-null value then the Consumer’s code block will be executed. Conversely, orElseGet similarly takes a Supplier as an argument and will assign a value to the Optional only when the original value in the Optional is empty.

Yes, you still have to remember to use an Optional instead of an if-null check and I would admit this is one of the less useful features in my view. Although there is still some power that comes from using Optionals in a similar fashion to streams and by providing some guarantees that null values will be handled appropriately.

Here is one example of using Optionals to do some logging after a failed service call and return ‘Call failed’ where null was returned from the fetching call. As you can see this chaining of functions can be quite succinct and expressive compared to if/else statements:

private abstract String fetchData(String id);

private String handleFailedServiceCall() {
    System.out.println("Failed to call service. No results returned.");
    return "Call failed.";
}

public String getCustomerDetails(String id) {
     return Optional.ofNullable(fetchData(id))
        .orElseGet(this::handleFailedServiceCall);
}

As mentioned, there are actually two functions for returning a default value to an Optional orElse() and orElseGet(). There are a few differences to be aware of. The first is that orElse takes a value as an argument whereas orElseGet takes a Supplier as an argument. The second difference is that orElseGet() will run lazily, so as you would expect this will only run when there is not a value present on the Optional. orElse() will run regardless each time upfront on each call whether there is a value or not. This means when calling getCustomerDetails in the above example, the function handleEmptyServiceCall will always be invoked if an orElse() is used, but by using orElseGet() handleFailedServiceCall will only get invoked when the original object is not present. I cannot see a legitimate reason for preferring to use orElse instead of orElseGet given the potential impact on performance unless the default object has already been created up front.

Similar to streams we can use filter/map/flatMap and a vast array of other functions on Optionals. Here is a code example trying to fetch customer details by name and return “Empty Customer” where the customer id has no valid prefix or where the customer is null.

private abstract Customer fetchData(String customerName);

private String handleNullCustomer() {
    return "Empty Customer.";
}

private boolean hasCustomerPrefix(String name) {
    return name.startsWith("CustID_");
}

public String getCustomerDetails(String name) {
    Optional<Customer> customerDetails = Optional.ofNullable(fetchData(name));

     return customerDetails
        .flat(Customer::getID)
        .filter(customerName -> this.hasCustomerPrefix(customerName))
        .orElseGet(this::handleNullCustomer);
}

LocalDate/LocalTime

Java 8 brought LocalDate and LocalTime as a substitute for java.util.Date. This was due to the old Date/Time classes being inherently problematic due to lack of multi-threading capabilities and some unintuitive conventions. Previously in Java handling timezone differences was more difficult, this has improved with some new additions to the language such as OffsetDateTime and ZonedDateTime. Java 8 also introduced Instant and Duration which allows developers to capture time and durations easily with nanosecond accuracy.

LocalDate represents Date, without time or time-zone information in ISO-8601 format:

LocalDate local = LocalDate.now();
LocalDate localExplicit = LocalDate.parse("2019-12-31");

LocalTime represents Time without Date or time-zone information in ISO-8601 format:

LocalTime local = LocalTime.now();
LocalTime localExplicit = LocalTime.parse("12:00");

LocalDateTime represents both Date and Time a combination of the prior and still without time-zone information:

DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm");
LocalDateTime localExplicit = LocalDateTime.parse("2019-12-31 12:00", formatter);

OffsetTime, OffsetDateTime and ZonedDateTime represent Dates and Times with zone information:

// OffsetDateTime stores time as an offset of +1:00 hours from UTC/GMT
OffsetDateTime utcPlusOneHour = OffsetDateTime.now(ZoneOffset.of("+01:00"));
// ZonedDateTime stores time with time zone information. This is the current time in Los Angeles
ZoneId zoneId = ZoneId.of("America/Los_Angeles");
ZonedDateTime timeInLA = ZonedDateTime.now(zoneId);

Dates and times can now easily be compared with isBefore and isAfter functions on LocalDateTime, LocalTime, LocalDate and ZonedDateTime. Making time and date comparisons much easier:

LocalDate currDate = LocalDate.now();
boolean haveIPassedYearEnd = currDate.isAfter(LocalDate.parse("2019-12-31"));
boolean amIStillIn2019 = currDate.isAfter(LocalDate.parse("2018-12-31")) && currDate.isBefore(LocalDate.parse("2020-01-01"));

Instant represents a timestamp in Java 8 and Instant.now() will find the current system UTC timestamp. Classes like Duration can be used to add or subtract Instants as required.

As ever there are varying levels of detail that can be delved into here. Look through the classes mentioned and the API to uncover details where you come across a need.

Photo by Pixabay Contributor

Legacy Code

30 Mar 2019

As developers, we know all too well that code has a lifespan. Knowing how to avoid code re-writes, how to improve software longevity, readability, maintainability and reduce the risk of bugs are all ongoing challenges. It’s clear that a good team will need to set some best practices and standards either directly through documentation or indirectly through consistency in the code. Some recent reading around the topic of legacy code (Working Effectively with Legacy Code by Michael Feathers) and technical debt inspired me to write this post to re-collect what I’ve learned. There are actually some very simple tweaks that everyone can do to make technical debt easier to deal with. When done right and often, small changes can hang together as part of a much bigger picture or help the next guy pull out some code, uplift a module or re-write some functionality from scratch. Sure enough ensuring your code reads well and that it is free from bugs will mean a focus on writing tests. Although I find that the cost of writing tests is not actually as time consuming as people think, instead it tends to be more about letting go of some utopian best practices and being willing to make some short term compromises. Here are my tips and thoughts on improving legacy code.

1. Breaking Dependencies

When we talk about legacy code the largest burden by far and large is making new changes without changing or losing any existing functionality. TDD tackles this but some classes can grow so out of control that it may seem difficult to test them. Testable code in general leads to well designed and clean code. Here are some approaches to breaking out dependencies to lead to easier testing.

Interfaces and Abstractions - Depending on abstractions over implementations is a key OOP concept, if a class under inspection refers to concrete classes in the constructor's signature then this could be the first place to turn to. Sure you can test with some dummy object values but at some point you are going to need to need to mock out this behaviour fully or swap in some new classes perhaps. If a particular dependency is causing you trouble, the dependency is used in more than one place and you haven't written an interface or abstract class for it already, then do this first.
Stub new methods - A technique many of us use subconsciously when writing tests. You add some new functionality to a method, the method becomes larger and you need a quick way to test the old method without what you added getting in the way. Perhaps you have now have a file that you now need to read, a call for a web request or a long running task you need to dispatch. Instead of writing the change inline inside of an existing function it would seem natural to at least create a new method to separate the new functionality, thus enabling you to mock the new method and test the old functionality. The downside being that in order to mock the function it instantly becomes public. This isn't a huge problem in my view if the function is consistent with our external view of this class's responsibilities (if it isn't we should probably break out a new class anyway and use DI). While this is very familiar technique to many, it is important to remember this is always a choice you can make over writing more mocking code. Some engineers can jump the gun and start to mock out libraries or dependencies rather than stubbing functions.
Stub new classes - Similar to the above, getting code under test can mean shifting your logic into a new class to help modularise your code. Create a new parameter for the default constructor of the original class under test and inject the new object through either a Dependency Injection framework or by explicitly creating a new instance in the calling code. In a vast majority of cases breaking out functionality is a good thing, but be careful to think about re-usability and overall coherence of your codebase. There are probably going to be a few dependencies on the constructor so a quicker and less intrusive change may introduce a setter rather than modifying the constructor. Some will argue this is not clean code anymore as we're exposing more properties than needed just for testing. However the goal is much larger here, if you don't have time to make a change then breaking out a new class and creating a new setter saying 'SetMockFieldValue' is not going to hurt. The intent of the setter is still very clear and it means you can get along with writing valuable tests.

public interface MyNewInterface {
    public void functionIWouldLikeToTest
}

public class MyOriginalClass {
    private MyNewInterface dependency;

    public MyOriginalClass(MyNewInterface dependency) {
        this.dependency = dependency;
    }

    // Use this if it is not easy to change the constructor
    public void setDependencyThroughMethodDI(
        MyNewInterface dependency) {
        this.dependency = dependency;
    }
}

Wrapper Classes/Decorators - A great use of the Decorator Pattern or wrapper classes is to use them for adding layers of new behaviour. You can build out new functionality into new wrappers so they are easily testable and isolated from your existing code. Decorators themselves bring an additional benefit such that we can use them to supersede original methods and re-implement any problematic initialisation code. For example, we could write a Mock wrapper from an original class. This can be useful if the existing class is already polluted with many dependencies and using dependency injection will cause more of a mess. We could simply extend the original class but by using a decorator we are favouring composition over inheritance which will make adding new functionality to classes easier in the long run. This does however throw around a little more boilerplate code and decorators do provide a temptation to wrap new functionality in infinite layers of complexity. I would say you should save this method for occasions when more complex functionality needs to be added to a class and for when dependency injection pollutes a class.

public interface Actions {
    public void existingMethod1;
    public void existingMethod2;
}

public class OriginalClass implements Actions {
    public void existingMethod1() { .. }
    public void existingMethod2() { .. }
}

public class MyDecorator implements Actions {
    private Actions originalObj;

    public MyDecorator(Actions originalObject) {
        this.originalObj = originalObject;
    }

    public void newMethodToTest() {
        // code here
    }

    public void existingMethod1() {
        originalObj.existingMethod1();
    }

    public void existingMethod2() {
        originalObj.existingMethod2();
    }
}

Subclasses - Similar to wrapper classes. If writing wrappers is too much to ask, maybe this requires writing more code than you would like right now, then a quick and dirty solution may be just to extend the class under test and override the necessary functions. As mentioned in the wrapper classes section we could introduce a new Mock class as an extension of an original class and override any problematic initialisation.
Null Object Pattern - The Null Object Pattern is useful for setting up test cases by passing a dummy equivalent of the dependency into a class that has empty attributes. Passing null may cause some errors to be thrown and in turn this reduces the risk of touching the existing code to handle NullPointerExceptions as the code should still handle an empty object just as any other.
Static Methods - While this is not a favorite choice of mine, consider turning a method into a static one if the method is hard to test as part of a class. If the method is a very isolated piece of functionality you could break this into a static method until you find a better solution.

2. Naming

The power of naming can often be underestimated. Every team will have their own set standards and you can normally gauge by the existing code what makes a consistent name to fit within the mix. You should choose names that make sense to your team and within your domain but not unobvious to new comers. Failing that, my personal view is that a good name is one that is short as possible and as long as it needs to be. As a rule of thumb they should be at least more than one letter unless there is a good reason such as a lambda, algebra, formulas, for loops or in contexts where they make sense. Conversely they should be shorter than around 30-35 characters. A name should be descriptive and should not use un-documented or uncommon acronyms. It can be amazing how many short hands can be used through a codebase on the assumption that everyone is familiar with their meaning. If its a function include a verb to indicate it is actually a function and try not to use ‘and’, come up with a different name instead that represents the function as a whole or break down the method further. I am guilty of this but try not to embed a variable’s type into the variable name, if anyone changes the type at any point this can be quite confusing. Think twice before including ‘Interface’ or ‘Abstract’ into your abstractions. These are a waste of characters and are not that much more expressive. This also goes to a lesser extent to Hungarian notation. For example, a IVehicle/Car could just simply be Vehicle/Car on their own and it’s normally not too difficult in modern IDEs to see that a class is either abstract or an interface. Swap out comments with well-named variables as your code should be as good and self descriptive, comments should really be used sparingly.

3. Design Roadmaps

There can be particular confusing parts to an application or there may be a lack of structure altogether. It pays to have discussions with team members about the overall architecture to reinforce understanding. Draw out some diagrams, document features and the overall design separately to the codebase. An up to date confluence page is going to help reinforce everyone’s understanding and ensure everyone is on the same page. Moreover this encourages knowledge sharing in the team and for others to do the same.

4. Delete code and comments

There are countless times where unused code and old comments have led me to a bunch of confusion. Sometimes an old way of doing something is left around or some comments surrounding a method no longer make sense. These are small things but can introduce a lot of confusion and misdirection. If you know a line or two that doesn’t belong there anymore you should spend a couple of minutes to clean it up or delete it in a separate commit.

5. Logging

Without logs debugging production issues is pure guess work. I personally advocate more logging at the risk of storing too much and over-polluting so I can at least capture issues, but you may choose to go with a more strategic logging strategy. Either way, define a standard, stick with it to make it easily searchable. Logging is going to aid your understanding on legacy code.

6. Error Handling

Putting best practices to good use when handling exceptions will pay dividends for debugging production issues. Use the right exception for an error, if unexpectedly a number is null then throw a NullPointerException, if an argument is incorrect throw an IllegalArgumentException. Try-catches should start with the most specific exception in the block and then cascade down to more generic ones so that errors are logged with greater accuracy. Re-throwing exceptions in many languages will mean you lose information about the original stack trace such as the original source and line number of the error. Personally I try to deal with an exception within the original try-catch block, log the exception and then use something similar to the Null Object pattern to notify the caller that something went wrong. If this is not possible it would be better to create a new type for your exception to best wrap the original exception and better describe the error. Last of all, I would try to standardise exceptions as much as possible as well as the messages that are thrown to improve readability.

Bad Error Handling

public String getCustomerName(int id) {
    try {
        // stream will not close automatically 
        // unless handled ina finally statement
        FileInputStream fio = new FileInputStream(filename);
        Customer c = getCustomer(fio, id);
        return c.toString();
    }
    // no specific error handling
    catch (Exception e) {
        // re-throwing error loses stack trace
        // and can cause confusion if handled twice
        throw e;
    }
    return "";
}

Better Error Handling

public String getCustomerName(int id) {
    // using a try-with statement so that the 
    // stream is automatically closed
    try (FileInputStream fio = new FileInputStream(filename)) {
        Customer c = getCustomer(fio);
        return c.toString();
    }
    catch (FileNotFoundException e2) {
        // specific error is handled first
        logError(e2);
    }
    catch (NullPointerException e1) {
        logError(e1);
        // if we really need to re-throw an exception then
        // we wrap it in a new exception to be handled 
        // by the caller
        throw new CustomerNotFoundException("Customer not found.", e1);
    }
    catch (Exception e) {
        logError(e);
    }
    return "";
}

These are just a few points I would consider important to moving legacy code forward with a particular emphasis on tests and breaking dependencies to facilitate change and refactoring without breaking functionality. Hopefully some of the techniques here reinforce what you may already know and help bring your code to a cleaner state.

Photo by Alexas_Fotos

The Java Memory Model

10 Mar 2019

Often times, the Java Memory Model is a talking point in technical interviews. Knowing how Garbage Collection works within Java is important to understanding the impact of your code, improving the efficiency of your applications with scale and choosing the right garbage collector for the job. Here are some key points and considerations related to the JVM and Garbage Collection.

Stack vs Heap Memory

Every value type in the JVM is stored on the thread’s stack. For each method called, a new block in memory is allocated for primitive values within that scope and then de-allocated immediately after each call. Compared to Heap memory, the stack is for short lived memory. Reference types on the other hand are stored on the Java Heap for longer use and then de-allocated through Garbage Collection. When the collector detects an object is no longer referenced the corresponding block in memory is freed to be re-used. A possible area of confusion here is that for any allocated object, the reference itself is actually stored on the stack whereas the actual object it is pointing to is stored on the heap.

JVM Heap

We are often concerned with the JVM’s Heap memory as running the garbage collector is an overhead. Running the garbage collector frequently and for long periods of time is going to have an impact on overall performance, as this is a ‘Stop the World’ operation. In other words, the running execution of all of your running threads will be paused until Garbage Collection has finished. For this reason there are a few strategies to deal with garbage collections more efficiently.

Garbage Collection Strategies

Each GC strategy at the JVM’s disposal have their own advantages/disadvantages. A classic but fundamental strategy that you should know of is the Mark and Sweep method. This particular algorithm or strategy will keep track of all of the running GC roots in the program. When the Garbage collector runs it will traverse these roots and ‘mark’ any objects that are still in use. The ‘sweeping’ part of the algorithm then takes place and objects not marked are left to be re-allocated for future use. After this there is a final step to compact the memory to ensure memory remains de-fragmented and for contiguous blocks to be re-used. Some other GC strategies include the following:

Reference Counting - Store a count of the number of references to each object, when the count is zero then free the object from memory. This can be difficult to determine for circular dependencies.
Copying - Similar to the Mark and Sweep method but the sweep stage will move across objects in use to a different partition and then compact memory.
Generational - Similar to copying. Separate memory into sections a young and old. Keep new objects in the young generation before phasing them into an older generation dedicated to taking the burden of more intensive but less frequent GCs.

Java uses a combination of these GC methods but the concept of splitting memory into generations is key.

JVM Model

As seen below the JVM memory heap is split into different sections to serve different purposes. This does not represent the real ratio between the different generations and actually the size of both the young and old generation partitions seen can be configured in your JVM arguments.

placeholder

The Young Generation

At first all objects are created in the young generation in the Eden space within the JVM. When filled, a Minor Garbage Collection will take place and move objects to either one of the two survivor spaces. It will alternate between these two spaces for each minor GC and keep track of how many times an object has survived a garbage collection.

The Old Generation

Major GCs are performed when the survivor space is full usually or if an object has survived a large number of GCs. This will move objects from the young generation into the old generation where as the name suggests, long lived objects reside. Running the garbage collector on the whole of the heap is inefficient process hence the design decision to separate parts of the JVM memory into generations. You may start to think of some different possibilities here, if an object in the old generation refers to another in the young generation for example there is a chance of collecting those during a minor GC unless the entire old generation is scanned. If we require scanning everything in this way, of course this would negate the design choice of having two different generations to begin with. To get around this the JVM handles this through a card table, storing the memory address ranges or buckets consisting of bits to mark where memory is dirty. When assigning a new object, the respective card entry in the table is marked. Later during a minor GC this is used to scan only parts of the old generation rather than the whole generation.

JVM Garbage Collector Settings

As mentioned, the JVM has a number of different collectors which a developer is free to choose from depending on which best fits the purposes of their application. Before choosing any of these, you should thoroughly research your choice and any alternatives. If in doubt, always use the default:

Serial Garbage Collector: The most basic of all types listed here. This collector freezes all threads when running and it is most suited for single threaded, 32 bit systems and most likely only a good choice for client machines.
Parallel Garbage/Throughput Collector: Uses multiple threads for scans and any compacting. This collector is an improvement from Serial but will still pause all other threads. This option is much more customisable, we can choose the number of GC threads, maximum pause between GCs, maximum heap size and other throughput options. This is suitable for some applications with some level of optimisation but can still permit some small pauses within the app.
CMS Collector (Concurrent Mark and Sweep): Like the Parallel Collector this uses multiple threads. This collector tries to limit the amount of time spent on stop the world operations but will run slower on average. Usually this collector requires both more cpu and heap memory in order to run steadily. This could be suitable for an always on server side environment where having no pauses are important to perceived performance.
The G1 Collector (Garbage First): A more recent collector as of JDK 7 update 4, designed to replace the CMS collector and for machines with higher memory specifications and thus this is the preferred collector going forward for server side environments. This collector comes with a high probability of meeting expected pause times at the expense of some throughput that a parallel GC may provide. Unlike CMS, this collector will partition the heap into contiguous, equally sized blocks in virtual memory each assigned to either Eden, Survivor or Old Generation duties. The main difference is that the heap is made up of many smaller blocks rather than large regions, providing more flexibility to allocate and remove objects. First a concurrent marking phase will take place, after which the collector will know which regions are mostly empty. These regions are collected first as they provide the best results to relinquishing memory and to fulfilling target pause times. This is different from other collectors where garbage is collected on a generational basis. As an intended improvement of the CMS Collector the use-case of this collector is similar.

Other Resources

Older Newer

CS Projects Hub