The Developer’s DevOps Mountain

Last week I made my way to Velocity Europe 2014 in Barcelona. Right after Velocity, I had the opportunity to talk about the current research we do in the CloudWave EU project at WebPerfDays. I was very excited to give this talk, as it would be the first time that our ideas and concepts would be presented to developers and performance engineers in industry, rather than academia. I was curious to find out how it would resonate with them. To my own satisfaction, it was received quite well and we had an interesting discussion after the talk.

The slides are online on Speakerdeck – but let me provide some context.

Premise: Cloud Development Study

Earlier in 2014, we conducted a study on how applications are actually developed for the cloud. In said study we interviewed cloud developers from larger enterprise companies to smaller businesses and startups to cover a broad range of industry development experience. We followed up with a quantitative survey where we gathered approximately 300 responses regarding our initial findings.

Tools for Cloud Software Development

One of the questions we asked in our survey was “Which tools do you use specifically for development for the cloud that you did not use before?” where participants could list one or multiple answers. We went over all answers and placed them in 5 categories (seen in the following table). Performance management tools rank as the most important on the list, leading us to the conclusion that performance considerations have indeed increased in importance for developers.

56% of cloud developers use performance tooling that they haven't used before

In addition, we asked the survey participants about a potential increase in metric availability in cloud systems: 62% agree or strongly agree that more information (metrics, logs, etc.) is now available in cloud systems.

62% say more metrics are available in the cloud

The full table listing the survey results are available in the paper preprint: arxiv.org/abs/1409.6502

Reality

This confirms what is widely publicized in the performance engineering community: We track more metrics, we graph those metrics, we provide beautiful dashboards (well, some more beautiful than others) to about any metric you can imagine. The unfortunate reality, though, has been reported by our interview study participants: When a performance issue is reported, developers rather go “by intuition” than look at metrics provided by them.

Ops be like…

I gave you everything you needed, you piece of....

Quote from Louis CK’s hilarious stand up: https://www.youtube.com/watch?v=ZC56jND10V4

Frankly speaking, this is quite frustrating… or at least irritating to our fellow operations engineers. They track everything that moves (or not), provide us with lovely dashboards with our favourite metrics and you’re not even looking at them?!

DevOps Mountain

If the developer won't go to the metrics... the metrics must go to the developer

Source: https://flic.kr/p/phfUy4 (Stoos, Switzerland)

“If the developer won’t go to the metrics, the metrics must go to the developer” is a play on the proverb “If the mountain will not come to the prophet, the prophet must go to the mountain”[1].

What does that mean exactly? How will the metrics “go to the developer”? The basic idea we pursue in our research is to serve application level metrics (e.g., response times of a method call) where they perform their daily work: in the IDE. We combine runtime metrics with static code analysis to correlate the detected performance issues with the exact location in the code where they occurred. On top of that, we want to provide predictive analysis to anticipate performance issues and warn the developer before it gets deployed.

In the CloudWave project, we call this concept “Feedback Driven Development”. (To be completely honest, I’m not a big fan of the “* Driven Development” naming scheme that has become quite popular with other techniques in software development. Thus, I further won’t be advertising our concept under this name.)

Use Case

This sounds all like a bunch of abstract marketing & sales talk. Let’s have a look at an example use case to illustrate what we mean by that. Our partner CloudMore provides us with an Enterprise VoIP client a use case application, which might look a little like your typical VoIP client:

cloudmore_voip_client

In this application, our use case is to display the online status of every user. Users can be queried from an external “User Directory Service”. A naive approach for this use case might resemble what we see in the following screenshot:

cloud_predictive_analysis_ide

What we can also see is a warning created by our tooling, indicating a predicted performance issue for the loop over contacts. What this warning is basically saying is “Dear Developer, the code you’ve just written will most likely result into a performance issue in production. You might want to consider refactoring…”. As a developer, you can’t escape. It’s right in your face. The mountain has arrived.

But how did this work? Let’s break it down a little bit:

  • Performance monitoring provides us with the response time of the isOnline method, which issues an external call
  • Through instrumentation we know (on average) the size of the contacts collection
  • Statistical models help us to identify future performance issues based on our collected data (prediction)
  • Static code analysis enables us to locate the predicted performance issue in our code base

Where do we stand?

We’re currently in the process of building a prototype of this idea as tooling for Eclipse for Java. (A shout out goes to Christian Bosshard, who has done a terrific job so far in building this tooling during the course of his master thesis).

The idea is to have 3 separate pillars, that are interchangeable:

  1. Frontend: This is where you write your code, your development environment (Eclipse, IntelliJ, Cloud9, …)
  2. Predictive Analysis: This can go from simple linear regression models to machine learning and AI techniques with rings and bells. Applications are different, so are their ways of dealing with performance issues and therefore their analysis.
  3. Data Sources:  Choose whether you use your own instrumentation and dashboards or any other performance analytics provider (Catchpoint, NewRelic, …)

The first version of our prototype will be open sourced around February 2015. In the near future, we want to expand on our use cases (beyond loops and response times) and integrate with data source providers.

We would appreciate any feedback on this topic. Drop me a line either here, on Twitter (@citostyle) or email.

[1] I’m certainly not the first to think of this. As a matter of fact, in German, the proverb is more commonly known as the opposite ‘Wenn der Prophet nicht zum Berg kommt, muss der Berg zum Propheten’, which brought me to this idea in the first place. Robert Louis Stevenson already tries to change the expression in “St. Ives, The Adventures of a French Prisoner in England”:

‘Well,’ said I, looking about me on the battlements by
which we sat surrounded, ‘this is a case in which Mahomet
must certainly come to the mountain.’
‘Pardon me,’ said Mr. Romaine; ‘you know already your
uncle is an aged man; but I have not yet told you that he is
quite broken up, and his death shortly looked for. No, no,
there is no doubt about it—it is the mountain that must
come to Mahomet.’