Saturday, April 21, 2018

Virtual cloudXchange 2018 Conference

Our abstract has been accepted for presentation at the first CMG cloudXchange Virtual Conference to be held on June 19, 2018.

Exposing the Cost of Performance
Hidden in the Cloud


Neil Gunther
Performance Dynamics, Castro Valley, California

Mohit Chawla
Independent Systems Engineer, Hamburg, Germany

Whilst offering versatile elastic capacity, the cloud also reintroduces an old mainframe concept—chargeback—which rejuvenates the need for performance analysis and capacity planning. Combining production JMX data with an appropriate performance model, we show how to assess fee-based EC2 configurations for a mobile-user application running on a Linux-hosted Tomcat cluster. The performance model also facilitates ongoing cost-benefit analysis of different AWS Auto Scaling policies.

Wednesday, March 14, 2018

WTF is Modeling, Anyway?

A conversation with performance and capacity management veteran Boris Zibitsker, on his BEZnext channel, about how to save multiple millions of dollars with a one-line performance model (at 21:50 minutes into the video) that has less than 5% error. I wish my PDQ models were that good. :/

The strength of the model turns out to be its explanatory power, rather than prediction, per se. However, with the correct explanation of the performance problem in hand (which also proved that all other guesses were wrong), this model correctly predicted a 300% reduction in application response time for essentially no cost. Modeling doesn't get much better than this.

Footnotes

  1. According to Computer World in 1999, a 32-node IBM SP2 cost $2 million to lease over 3 years. This SP2 cluster was about 6 times bigger.
  2. Because of my vain attempt to suppress details (in the interests of video length), Boris gets confused about the kind of files that are causing the performance problem (near 26:30 minutes). They're not regular data files and they're not executable files. The executable is already running but sometimes waits—for a long time. The question is, waits for what? They are, in fact, special font files that are requested by the X-windows application (the client, in X parlance). These remote files may also get cached, so it's complicated. In my GCAP class, I have more time to go into this level of detail. Despite all these potential complications, my 'log model' accurately predicts the mean application launch time.
  3. Log_2 assumes a binary tree organization of font files whereas, Log_10 assumes a denary tree.
  4. Question for the astute viewer. Since these geophysics applications were all developed in-house, how come the developers never saw the performance problems before they ever got into production? Here's a hint.
  5. Some ppl have asked why there's no video of me. This was the first time Boris had recorded video of a Skype session and he pushed the wrong button (or something). It's prolly better this way. :P

Wednesday, February 21, 2018

CPU Idle Is Not Like White Space

This post seems like it ought to be too trite to write but, I see the following performance gotcha cropping up over and over again.

Under pressure to consolidate resources, usually driven by management and especially regarding processor capacity, there is often an urge to "use up" any idle processor cycles. Idle processor capacity tends to be viewed like it's whitespace on a written page—just begging to be filled up.

The logical equivalent of filling up the "whitespace" is absorbing idle processor capacity by migrating applications that are currently running on other servers and turning those excess servers off or using them for something else.

Friday, August 4, 2017

Guerrilla Training September 2017

This year offers a new 3-day class on applying PDQ in your workplace. The classic Guerrilla classes, GCAP and GDAT, are also being presented.

Who should attend?

  • architects
  • application developers
  • performance engineers
  • sys admins
  • test engineers
  • mainframe sysops
  • database admins
  • webops
  • anyone interested in getting beyond ad hoc performance analysis and capacity planning skills

Some course highlights:

  • There are only 3 performance metrics you need to know
  • How to quantify scalability with the Universal Scalability Law
  • Hadoop performance and capacity management
  • Virtualization Spectrum from hyper-threads to cloud services
  • How to detect bad data
  • Statistical forecasting techniques
  • Machine learning algorithms applied to performance data

Register online. Early-bird discounts run through the end of August.

As usual, Sheraton Four Points has bedrooms available at the Performance Dynamics discounted rate. Link is on the registration page.

Also see what graduates are saying about these classes.

Tell a colleague and see you in September!

Wednesday, March 15, 2017

Morphing M/M/m: A New View of an Old Queue

The following abstract has been accepted for presentation at the 21st Conference of the International Federation of Operational Research Societies — IFORS 2017, Quebec City, Canada.

[Update July 31, 2017: Here are my IFORS slides

This year is the centenary of A. K. Erlang's paper [1] on the determination of waiting times in an M/D/m queue with $m$ telephone lines.* Today, M/M/m queues are used to model such systems as, call centers [3], multicore computers [4,5] and the Internet [6,7]. Unfortunately, those who should be using M/M/m models often do not have sufficient background in applied probability theory. Our remedy defines a morphing approximation [3] to M/M/m that is accurate to within 10% for typical applications.† The morphing formula for the residence-time, $R(m,\rho)$, is both simpler and more intuitive than the exact solution involving the Erlang-C function. We have also developed an animation of this morphing process. An outstanding challenge, however, has been to elucidate the nature of the corrections that transform the approximate morphing solutions into the exact Erlang solutions. In this presentation, we show:
  • The morphing solutions correspond to the $m$-roots of unity in the complex $z$-plane.
  • The exact solutions can be expressed as a rational function, $R(m,z)$.
  • The poles of $R(m,z)$ lie inside the unit disk, $|z| < 1$, and converge around the Szegő curve [8] as $m$ is increased.
  • The correction factor for the morphing model is defined by the deflated polynomial belonging to $R(m,z)$.
  • The pattern of poles in the $z$-plane provides a convenient visualization of how the morphing solutions differ from the exact solutions.

* Originally, Erlang assumed the call holding time, or mean service time $S$, was deterministic with unit period, $S=1$ [1,2]. The generalization to exponentially distributed service periods came later. Ironically, the exponential case is easier to solve than the apparently simpler deterministic case. That's why the M/D/1 queue is never the first example discussed in queueing theory textbooks.
† The derivation of the morphing model is presented in Section 2.6.6 of the 2005 edition of [4], although the word "morphing" is not used there. The reason is, I didn't know how to produce the exact result from it, and emphasizing it would likely have drawn unwarranted attention from Springer-Verlag editors. By the time I was writing the 2011 edition of [4], I was certain the approximate formula did reflect the morphing concept in its own right, even though I still didn't know how to connect it to the exact result. Hence, the verb "morphs" timidly makes its first and only appearance in the boxed text following equation 4.61.

References

  1. A. K. Erlang, "Solution of Some Problems in the Theory of Probabilities of Significance in Automatic Telephone Exchanges," Electroteknikeren, v. 13, p. 5, 1917.
  2. A. K. Erlang, "The Theory of Probabilities and Telephone Conversations," Nyt Tidsskrift for Matematik B, vol 20, 1909.
  3. E. Chromy, T. Misuth, M. Kavacky, "Erlang C Formula and Its Use In The Call Centers," Advances in Electrical and Electronic Engineering, Vol. 9, No. 1, March 2011.
  4. N. J. Gunther, Analyzing Computer System Performance with Perl::PDQ, Springer-Verlag, 2005 and 2011.
  5. N. J. Gunther, S. Subramanyam, and S. Parvu, "A Methodology for Optimizing Multithreaded System Performance on Multicore Platforms," in Programming Multicore and Many-core Computing Systems, eds. S. Pllana and F. Xhafa, Wiley Series on Parallel and Distributed Computing, February 2017.
  6. N. J. Gunther, "Numerical Investigations of Physical Power-law Models of Internet Traffic Using the Renormalization Group," IFORS 2005, Honolulu, Hawaii, July 11—15.
  7. T. Bonald, J. W. Roberts, "Internet and the Erlang formula," ACM SIGCOMM Computer Communication Review, Volume 42, Number 1, January 2012.
  8. C. Diaz Mendoza and R. Orive, "The Szegő curve and Laguerre polynomials with large negative parameters," Journal of Mathematical Analysis and Applications, Volume 379, Issue 1, Pages 305—315, 1 July 2011.

Tuesday, January 17, 2017

GitHub Growth Appears Scale Free

Update of Thursday, August 17, 2017: It's looks like we can chalk up another one for the scale-free model (described below) as Github apparently surpasses 20 million users. Outgoing CEO Wanstrath mentioned this number in an emailed statement to Business Insider.
"As GitHub approaches 700 employees, with more than $200M in ARR, accelerating growth, and more than 20 million registered users, I'm confident that this is the moment to find a new CEO to lead us into the next stage of growth. ....."

The Original Analysis

In 2013, a Redmonk blogger claimed that the growth of GitHub (GH) users follows a certain type of diffusion model called Bass diffusion. Here, growth refers to the number of unique user IDs as a function of time, not the number project repositories, which can have a high degree of multiplicity.

In a response, I tweeted a plot that suggested GH growth might be following a power law, aka scale free growth. The tell-tale sign is the asymptotic linearity of the growth data on double-log axes, which the original blog post did not discuss. The periods on the x-axis correspond to years, with the first period representing calendar year 2008 and the fifth period being the year 2012.

Saturday, October 8, 2016

Crib Sheet for Emulating Web Traffic

Our paper entitled, How to Emulate Web Traffic Using Standard Load Testing Tools is now available online and will be presented at the upcoming CMG conference in November.

Presenter: James Brady
Session Number: 436
Subject Area: APM
Session Date: WED, November 9, 2016
Session Time: 1:00 PM - 2:00 PM
Session Room: PortofinoB