How We Gave Superpowers to Our macOS CI
Discover how we shortened the feedback loop of pre-merge verification checks for our iOS applications by up to 75%.
Who doesn’t love a continuous integration system that is stable, fast, reliable and feels transparent to its users? One of the missions of the Client Platform tribe at Spotify is to improve the continuous integration and delivery experience of our mobile apps. We are constantly trying to reduce flakiness and make builds as fast as possible to prevent the slow-down of daily development and releases.
In this post, I would like to explain how we are specifically improving the continuous integration experience by upgrading the macOS machines that we use to run builds. In the last few months, this allowed us to shorten the feedback loop of the pre-merge verification checks for our iOS applications by up to 75%.
Up until 2019, we ran all our continuous integration builds for iOS apps on macOS virtual machines (on both 2014 Mac Pros and 2014 Mac Minis). This was really convenient since cloning the state from one machine to another was completely automated, i.e. during Xcode and major macOS upgrades. Multiple VMs usually run on the same machine and they also share a storage area network (SAN). Luckily, we don’t host the machines ourselves, but let experts do it for us. We use the macOS cloud provider, Flow Swiss, and have developed a good relationship with the team following years of partnership.
We slowly started to notice a steady increase in our build times and decrease in stability of our build machines as our codebases grew. By talking with other teams in the industry, we noticed that more and more teams were moving to a bare metal solution, which sparked our interest.
Going Bare Metal
In January 2019, we noticed that Flow added a mention of a new bare metal solution that they were working on to their website. The vision of this new product is the following: “Bare Metal Macs should be controllable and manageable like traditional virtual machines”. We didn’t want to completely lose the benefits of virtual machines, but we didn’t want to give up on our performance either. To better understand if switching to bare metal could help us shorten our build times and improve the performance and stability of our continuous integration system, we decided to run some benchmarks. Flow promptly provided us with two 2018 Mac Minis (3.2GHz 6‑core i7, 32GB RAM) for benchmarking purposes. We used a iOS Music client release build to benchmark the machines with all caching disabled (which we rarely do during normal development), to really test the worst possible scenario and put the machines under heavy load. We ran such a build every 2 hours for a couple of days to get some useful data and we quickly noticed an interesting trend. Builds on a 2018 Mac Mini were twice as fast, and a lot more stable.
Our main issue with the virtual machine agents was the variance of build times. Sometimes, the same exact build could take 50% less or more time, and it was unclear why. As previously mentioned, more than one virtual agent can run on the same physical machine, so our assumption was that this, along with the fact that all machines were reading and writing to the same SAN, was the bottleneck. The typical enterprise SAN has a I/O pattern that consists of 40% writes and 60% reads. Instead, our usage during peak hours was 80% writes and 20% reads. So we knew that there wasn’t much room for improvement by continuing to invest in virtual machines for our builds. We had to go bare metal.
As you can see from the graph, the build times on the bare metal machine look perfectly constant, which is exactly what we were expecting. If you compile the same codebase over and over again in a sandboxed environment, the build times should be consistent since nothing can affect the performance of the OS outside the compilation tasks of the build itself. Some engineers even looked at the graph and said: “The run times look suspiciously consistent on bare metal, it just looks almost too good to be true!” We were so used to the fluctuating build times of our virtual machines, but we knew this was the right choice to make if we wanted to drastically improve our macOS CI experience.
Over the following months, we kept in touch with Flow to be early adopters of the new offering given the results of our benchmarks. We started working on updating our tooling to work on the new hardware shortly after placing our first order for some new machines. We wanted to focus on improving builds that engineers care the most about: pre-merge checks. In the Spotify Music app repository, there are several checks that have to pass before changes can be merged to master, which in turn causes thousands of builds to be triggered every day. All these builds need to be completed as fast as possible in order to get new features and bug fixes in the hands of our customers quickly. We, therefore, slowly started to move some pre-merge configurations to bare metal, i.e. linting and static analysis, to test that our infrastructure was behaving as expected. Even though these configurations don’t perform any compilations, we still saw an improvement of over 30% in their duration simply due to the fact that they could make use of the full performance of these new, more powerful machines.
We estimated that 45-50 agents were needed to run all pre-merge configurations on bare metal, so for a few weeks, only some specific checks would run on bare metal, and thus, finish faster.
Even without fully running all our pre-merge checks on bare metal, we started to see some improvements. You can see from the above graph that the percentages for our configuration that builds an App Store version of the app all decreased. P50 decreased from over 16 minutes to less than 8 minutes. During this transition period, we used a data-driven approach with dashboards and alerting systems to make sure the quality of our service was not deteriorating.
The table above compares the duration of the same checks between October 1st and December 31st on both types of build agents. During this time, the same configurations could run on a virtual or bare metal machine, depending on availability. This allowed us to really see the improvement when running the same exact type of check.
As you can see, the biggest wins are in the unit testing configurations. By using bare metal, the iOS simulator is a lot more stable and we can run up to 6 simulators simultaneously (up from only 3 on VMs), along with a lot less flaky tests.
We couldn’t be happier with the results of our transition to bare metal machines, and the results really speak for themselves. We started with just a small benchmark that validated our hypothesis and we continuously gathered data and monitored the progress to help us make the right decisions as we scaled up the number of bare metal machines in our infrastructure. We are definitely looking to increase the number of bare metal machines we employ in our continuous integration environment and to completely sunset our virtual machines at some point in the future. Of course, our big gains came from moving to a much newer and more modern hardware, as well as getting rid of the virtual machine overhead in one go.
This effort has been possible in thanks to many squads in the Client Platform. We would also like to thank the Flow Swiss folks for their great help in this project.
If you think any of this work is interesting to you and you would be up to solving some interesting problems, check out our Join the Band page.