Oddly enough I've generated similar graphs for stack tracing and profiling in the far far distant past and had not realised they'd been named and formalised.

For anyone else who has not recognised the name, I found these origin links which may interest some:

    Abstract [2]:
    Flame graphs are a simple stack trace visualization that helps answer an everyday problem: how is software consuming resources, especially CPUs, and how did this change since the last software version? 

    Flame graphs have been adopted by many languages, products, and companies, including Netflix, and have become a standard tool for performance analysis.

    They were published in "The Flame Graph" article in the June 2016 issue of Communications of the ACM, by their creator, Brendan Gregg.


(hour long usenix talk of [2])

This paper appears to be formalising the nature of a flame graph, recognising that they rest on data from sampling profilers and as such suffer a "never the same twice" variation when repeatedly run, and thus seeking to quantify "distance" of one graph from another as well as "average" of multiple graphs (for the "one true mean reference graph" I guess).

The motivation is then to be able to reliably measure "distance" from one group of profiling runs to another, before and after changes of interest.

I dare say much more is possible, but I fear I've barely skimmed the material before bedtime :/

I don't think this algebra is a full description of flame graphs but it feels like it captures part of what you want for the limited purpose of global performance analysis.

I feel like a flame graph cannot be an element of a vector space since vectors are commutative and the stacks of a flame graph are ordered. From a performance perspective, ignoring the ordering can simplify the algebra (apparently by turning it into a vector space) and will still correspond with your overall impression of "time spent in a frame".


    |A|B |A|
    |C|C |C|
is (1, C;A), (2, C;B), (1, A;C) but as (1, C;A) + (2, C;B) + (1, A;C) with vector addition we can rearrange this to (2, C;A) + (2, C;B)


    |A |B |
    |C |C |
which of course still captures the time spent in C;A and C;B, but it has lost some information. We think of it as a different flame graph. Of course, ignoring that information can be meaningful to a performance analysis.

Indeed the ordering can only be important to application semantics. This is clearly accentuated by the norm, which is even destroying the information of the basis elements.

Very nice formalization.

One area for refinement: it considers two stacks either identical or unrelated. Consider that stack A;B is actually very close to A;B;C, the difference might be due to a sample time occurring just before or just after the call to C. OP considers them just as different as A;B and Z;W, therefore amplifying a measurement noise.

This suggests using a refined metric between stacks (e.g., an edit distance counting pushes and pops), and then we can use it in defining the metric between flamegraphs (e.g., an optimal transport metric [1], instead of the proposed L1).

Avoiding that noise amplification reduces the background noise level, therefore the cost of effective measurements. From another perspective, the current OP scheme creates an avoidable curse of dimensionality in the form of the Hotelling test's requirement that each sample has more measurements have more samples than distinct stack frames. So the same code split into more functions is harder to measure, and too-small samples are useless. I think neither of those is necessary if we take stack similarity into account.


As a developer who doesn't use mathematical notation very often, I've had some difficulty reading it. From my understanding, the paper goes as follows:

2.1 Define FlameGraph

    Frame = something that identifies function/location in the code
    Stack = Vec<Frame>
    FlameGraph = Map<Stack, Real>
    FlameGraphPositive = Map<Stack, RealPositive>
2.2 Define FlameChart

    FlameChart = Vec<{ start: Real, flame_graph: FlameGraphPositive }>
    # such that the start values are strictly ordered
    fn to_flamegraph(flame_chart: FlameChart) -> FlameGraphPositive {
            .map(|{ start, flame_graph }| flame_graph)
Personally, I think a better definition of FlameChart is:

    FlameChart = Vec<{ end_time: RealPositive, stack: Stack }>
    # such that sampling starts at 0 and Vec is strictly ordered by end_time
    fn to_flamegraph(flame_chart: FlameChart) -> FlameGraphPositive {
        let mut start = 0.0;|{ end_time, stack }| {
            let elapsed = end_time - start;
            start = end_time;
            FlameGraphPositive::from_stack(stack, elapsed)
3.1 Define diff

    # a,b: FlameGraphPositive
    fn diff(a, b) -> FlameGraph;
    # such that a + diff = b
    # diff: FlameGraph
    # add, sub: FlameGraphPositive
    fn to_positive(diff) -> { add, sub };
    # such that diff = add - sub
3.2 Define a naive diff metric

    # a,b: FlameGraphPositive
    fn diff_metric(a, b) -> RealPositive {
        diff(a, b).size() / (a.size() + b.size())
    # diff_metric(a, b) \in [0, 1]^Real
3.3 Define a more sophisticated diff metric

Using Hotelling T^2 Test, the author defines a metric, which takes sampling variance into account, and allows to detect performance regression, even when the sampling profile is noisy. (you'd have to read the paper for the exact method)

The code for this test is here:

I'm confused, what are the takeaways, and why are there 83 upvotes with no discussion?
A link to a paper on the usefulness of a certain type of graph... Never change HN