Monoliths cover multiple aspects of an organization and often show unregulated growth, commonly known as big-ball of mud. They are usually to big to benefit from a cloud deployment and certainly are not separating concerns. Scaling up individual parts as needed is difficult and developers facing a high cognitive load looking at a huge code base. A remote team structure usually struggles with conflicts.

Breaking up such a monolith might be a real challenge since it is often the central application for data entry. I have myself worked for an organization running a web based ERP handling every aspect of the business. Various EDI synchronization, users, orders,production planning, inventory and many more domains. At that time I would have been happy for any guide making the migration to independent components more predictable.

Replacing such a giant requires a strategy and some tactical patterns. Otherwise the risk of getting lost would become too high. As solution architect a big part of the migration work is forming a domain structure with different stakeholders and communicating the migration work to non-IT members of the project.

Analyzing The Existing Structure

Before analyzing the application itself, having a good understanding of the context is key:

  • What are the companies business domains and how are they structured?
  • What are the services the organization provides to their customers?
  • Who are the users (external supplier, customers, departments)
  • Has there been a shift in what the application was originially designed for and what it is used for today?

Domain driven design principles can help with this.

Motivation To Split A Monolith Into Components

Some common one:

  • gain flexibility
  • shorter feature development+deploy leadtime
  • scalablility/elasticity to serve customers reliably and run not more instances than required
  • security
  • remote team or outsourcing of softwaredevelopment projects

Keep in mind that the existing application served the organization very well for a long time. Decomposing it doesn’t necessarily add functionality, however it should be as functional and reliable as the previous application. In some cases creating independent components is simply a waste of energy and resources.

Often a partial migration can be a good start.

Example: If an ERP(Enterprise Resource Planning) has previously served as “application for all needs” offering inventory, billing and an online shop, the need might be to have a shop able to serve more customers during peek hours. In that case only the shop is extracted and the monolith can be used for a longer time.

Applying domain-driven principle can help to unclutter different subdomains. Domain-driven design differentiates three types of subdomains: the core, the generic and the supporting subdomain. The core subdomain is what distinguishes a organization from another one. It should be implemented and not outsourced since its highly volatile and complex. If a monolith covers generic and supporting domains it could make sense to evaluate of the shelf products, instead of creating your own version of something that is already existing.

Decomposition or Tactical Forking

There are basically two different choices available.

The first one, Decomposition involves splitting up the monolith by its module boundaries (i.e. through namespace/packages ). It assumes the monolith code has clear boundaries. Decomposition can carve out a service-based architecture before splitting up the database. For independent services it is necessary to also split the database eventually, otherwise the quantum remains the same as for the monolith before (all services are dependent on each other through the database dependency). Being able to execute it step by step is a clear benefit of this approach.

decomposition

The fallback option is Tactical Forking. It forks out the whole codebase for each component and deletes unnecessary parts. It is applicable for any case and teams can jumpstart without much preparation, however it has several trade offs. If the monolith is huge, replicating it multiple time eats up lots of resources. Additional, code that was previously shared throughout the codebase might cause inconsistencies in the forks. Teams need to be extra careful to identify shared code, agree on namings and split shared parts out as service or shared library as soon as feasible. Last but not least without applying a large scale refactoring on the legacy codebase, it won’t be less chaotic than before.

tactical-forking

Decomposition Patterns

The described patterns below are certainly not new and I did not invent them. However I seldom see a migration working out well and planned. I simply describe and summarize them and invite you to learn more about through other sources. I will also extend them in the future as I see fit.

Most of the Patterns are described in more detail in the book N. Ford,M. Richards, P. Sadalage, Z. Dehghani (2021-09-23), Software Architecture: The Hard Parts: Modern Trade-Off Analyses for Distributed Architectures, O’Reilly

Identify Architecture Problems Through Metrics

Key metrics are Coupling, Instability and Abstractness. Calculating those helps to surface architectural problems. They can be an easy first step to understand the structure of the source code.

There are a bunch of tools available. For java a prominent tool is JDepend.

Coupling

Describes how many incoming and outgoing connections a component has, also called Afferent and the Efferent coupling. I personally dislike those namings, however they describe a simple concept:

Afferent (Ca) being the number of artifacts* that depend on the artifact in scope

Efferent (Ce) being the number of artifacts* the artifact in scope depends up on

*artifact hereby describes a class, package or module. For decomposition package/module would usually be the artifact.

Instability

I = Ce / (Ce + Ca)

The combination of afferent and efferent couplings describes the resilience to change. Values range from 0 (stable) -> 1 (instable). Instable or stable do not equal to good or bad. If a persistence layer is very stable it could indicate outward pointing dependencies which are usually unwanted. It would mean the inner layers (business,domain,…) need to be changed if the infrastructure changes.

Below example shows a simplified version of the hexagonal architecture. Database and Application are decoupled by the bold highlighted “Ports” which is are stable packages containing interfaces. Applying changes on “Ports” is difficult because it would break code depending on it. Application, UI and Database are instable which allows to apply changes without breaking other parts of the application.

example-architecture-stability-classdiagram

Abstractness

Describes the ratio of the number of abstract classes (and interfaces) in a component to the total number of classes in that component. Values range from 0 (no abstract classes) - 1 (only abstract classes). This metrics gives an hint on the understandability of a component.

If Component Sizing identifies a huge class and the abstractness is low it is very likely a complex piece of code.

Combined with the instability it provides the Distance |A+I-1|. If the distance is large the artifact might be either too abstract and not useful or not abstract enough making it hard to understand.

Component Sizing

It is helpful to identify the size of each component. Outliers can be recognized and either combined with other components or further decomposed. A table could list packages with class count, instructions count or any other helpful sizing metrics. Based on those counts,s the overall percentage can be calculated indicating outliers quickly.

Identify Common Domain Functionality

Consolidate duplicate and shared functionality. If module names are consistent, duplicate names in the package name can be of additional help to identify such functionality. Create a shared library or a shared service.

Flatten And Unify Component Structure

The module structure is often inconsistent. Either because there was no clear rule or it simply was not followed. If we consider domain-driven design techniques I recommend to use the following structure:

Application

Domain

Subdomain

Component

Class

Only leaf nodes should contain classes. To ensure compliance, setup a fitness function verifying this structure during build.

Determine Component Dependencies

Components have incoming and outgoing dependencies. Visualizing the dependencies helps estimating the effort for a monolith migration. It also helps explaining to non IT folks. If it looks like a representation of the international flight connnection map, migrating a monlith could be worse than starting from scratch. If there are only a few dependencies or if dependencies have a clear structure (all pointing to one or two components).

Identify Subdomains

Some components might be something that does not count as organizations core subdomain; That means it helps the business but it does not represent its specialties. I.e. a software consultant organization might use a bug tracking system for customer issue tracking. This however is a supportive service. Maintaining your own bug tracking system is not justifiable. In such case replacing such code with an existing solution might be faster and better. creating a trade-off analysis helps with the decision.

Document Your Decisions

In a previous post I described why documenting architectural decisions is important. If such documentation is existing already it would be the best source to describe source code structure and architectural details that have to be considered while decomposing into seperate components. Decomposition in any case should deserve such an ADR.

Summary

  • Before splitting up a monolith, be clear about motivations & outcomes. Create a trade-off analysis upfront.
  • Be clear about the organizations core subdomain. If there are existing solutions on available re-use instead of redevelop. Focus on migrating those components that come with the greates benefit for the business.
  • The two common approaches are Decomposition (if the monolith has a reusable modular structure) or Tactical Forking (extract entire source for each component and delete unnecessary parts).
  • Obtaining metrics about the code (coupling, instability, …) helps identifying architectural flaws.
  • Create fitness functions to automatically check new architectural guideline compliance with every code change.

Links