By Katja Weisheit | 01.10.2017
After 2 years of architectural consulting in high available architectures with fast changing business requirements we saw a lot of different approaches, How-tos, Dos and Don’ts. Microservice architectures, Agile projects, APIs and DevOps were concepts we worked with every day. But it is one thing to give advice – and another thing to live the concepts every day. Given the chance to work as an “Incubator” team and quickly setup – and run – many Microservices, we directly jumped into “living DevOps”. And what should we say – it worked – thinking “BIG” but starting “small”. Within months we have setup a team with the right mindset, a testlab with the right tooling, continuous integration and delivery – and we learned: “You build it, you run it” is more than that.
Nevertheless, we want to share our approach and tooling to build and deploy our services directly to a Container as a Service (CaaS) environment on the customer side. And – (as you might guess from the blog title) – our experiences with DevOps and beyond this.
Our testlab – as well as the customer environment – provides a CaaS platform using Kubernetes as Container orchestration tool.
Our CI (Continuous Integration)
Continuous Integration is performed only in our pentacor testlab in order to provide all the required quality assurance steps and continuously integrate any microservice implementation before delivering it to the Customer.
Our check-in triggered build pipelines create Docker images which are pushed to our pentacor Docker registry.
The build pipeline is using Apache Maven and Apache Groovy scripts which are checked out from Bitbucket as well – so “everything is code”. At this stage the Docker containers are deployed to a dedicated Kubernetes cluster. After successful deployment (pods are up and running) the full set of automated integration and resilience tests are running.
Any failures are pushed as slack messages to alert the relevant team that something went wrong. In this case we learned: sometimes finger pointing helps :-). In the beginning it seemed that there were only 2 cases: several team mates started in parallel to verify the problem – or the whole team ignored the alerts. From that point of “lessons learned” we added an explicit relation to the check-in + author that triggered the build. Our strict rule is: this developer will take care of the broken pipeline. Any “green” sources can be used as release candidates and forked to branches based on the agreed sprint releases.
Our CD (Continuous Delivery)
Continuously delivery includes all the Continuous Integration steps plus a fully automated delivery of any Microservice to a configurable Customer Environment. This makes it a cakewalk to deploy new releases within just a few minutes.
Analogue to the CI pipelines, the CD build pipelines create Docker images which will not be changed any more – up to deployment to production. All environment specific values (like other endpoint URLs) are provided in Kubernetes deployment files as environment variables or config maps. So, these differences are clear and not hidden within the images.
CD builds are related to dedicated Bitbucket release branches. Depending on the agreed sprint goals we sometimes deploy our services several times a week to production. The images are deployed to a second Kubernetes cluster on our pentacor testlab (the so-called staging). At this stage we run a set of API smoke tests to make sure that all API endpoints are available and responsive. After all tests are successful, the service is deployed to the customer’s integration environment. The same set of API tests must run there before the service will be deployed to the production environment. We agreed to add manual approval steps after each stage as additional quality gate. A sample Jenkins pipeline including all the steps can be seen in the picture below.
In a traditional approach, the development mostly sent a ZIP archive including the deployment package plus operational handbook to the operations team and then the work was done for them! Those days are gone - and we are happy about that!
Of course, we sometimes must analyse problems that we wouldn’t even have recognized in the good old days – like unreachable dependent services or infrastructure problems. But on the other side you get direct feedback on what you have designed and implemented. And you can decide to improve it whenever you like – without waiting for free time slots of operations team – or “platform release cycles”.
What about transparency in production?
So how do we get notified in case a service is not working as expected? Waiting for the customer to realize an outage normally leads to a notification. In our approach however, we try to be the first to realize a problem without the customer even noticing it.
So, shortly after first deployment to the customer environment we set up a very simple but nevertheless reliably working monitoring of our service endpoints and healthiness – including the reachability of all related services. And again: for any problems, an alert is raised to the responsible team via Slack. So, during the lifetime of our services, we usually started the evaluation of problem cases before any consumer even recognizes that there is one.