Friday, September 02, 2016

Microservices and distribution

OK so following on from my previous article on inferring the presence of microservices within an architecture, one possibility would be to view the network traffic. Of course it's no guarantee, but if you follow good microservices principles that are defined today, typically your services are distributed and communicating via HTTP (OK, some people say REST but as usual they tend to mean HTTP). Therefore, if you were to look at the network traffic of an "old style" application (let's not assume it has to be a monolith) and compare it with one that has been re-architected around microservices, it wouldn't be unreasonable to assume that if you saw a lot more HTTP requests flowing then microservices are being used. If the microservices are using some other form of communication, such as JMS, then you'd see something equivalent but with a binary protocol.

We have to recognise that there are a number of reasons why the amount of network traffic may increase from one version of an application to another, so it could be the case that microservices are not being used. However, just as Rutherford did when searching for the atomic nucleus and which all good scientists follow, you come up with a theory that fits the facts and revise it when the facts change. Therefore, for simplicities sake, we'll assume that this could be a good way to infer microservices are in place if all other things remain the same, e.g., the application is released frequently, doesn't require a complete re-install/re-build of user code etc.

Now this leads me to my next question: have you, dear reader, ever bothered to benchmark HTTP or any distributed interaction versus a purely local, IPC, interaction? I think the majority will say Yes and of those who haven't the majority will probably have a gut instinct for the results. Remote invocations are slower, sometimes by several orders of magnitude. Therefore, even ignoring the fault tolerance aspects, remote invocations between microservices are going to have a performance impact on your application. So you've got to ask: why am I doing this? Or maybe: at what point should I stop?

Let's pause for a second and look back through the dark depths of history. Back before the later 19th Century/early 20th Century, before electrification of factories really took off, assembling a product from multiple components typically required having those components shipped in from different parts of the country or the world. It was a slow process. If something went wrong and you got a badly built component, it might prevent assembly of the entire product until a new version had been sourced.

In the intervening years some factories stayed with this model (to this day), whereas others moved to a production factory process whereby all of the pieces were built on site. Some factories became so large, with their constituent pieces being built in their own neighbouring factories that cities grew up around them. However, the aim was that everything was built in one place so that mistakes could be rectified much more quickly. But are these factories monoliths? I'm not so sure it's clear cut simply because some of the examples I know of factories like this are in the Japanese car industry which has adapted to change and innovation extremely well over the years. I'd say these factories matured.

Anyway, let's jump back to the present day but remembering the factory example. You could imagine that factories of the type I mentioned evolved towards their co-located strategy over years from the distributed interaction approach (manufacturers of components at different ends of the planet). They managed to evolve because at some point they had all of the right components being built but the impediment to their sales was time to market or time to react. So bringing everything closer together made sense, Once they'd co-located then maybe every now and then they needed to interact with new providers in other locations and if those became long term dependencies they probably brought them "in house" (or "in factory").

How does this relate to microservices and the initial discussion on distributed invocations? Well whilst re-architecting around microservices might help your application evolve and be released more frequently, at some point you'll need to rev the components and application less and less. It becomes more mature and the requirements for change drop off. At that stage you'd better be asking yourself whether the overhead of separate microservices communicating via HTTP or even some binary protocol, is worth it. You'd better be asking yourself whether it's not better to just bring them all "in house" (or in process) to improve performance (and probably reliability and fault tolerance). If you get it wrong then of course you're back to square one. But if you get it right, that shouldn't mean you have built a monolith! You've just built an application which does it's job really well and doesn't need to evolve much more.