Skip to content
Commits on Source (3)
# Using the container-based infrastructure
sudo: false
language: perl
perl:
- "5.24"
- "5.22"
- "5.20"
- "5.18"
- "5.16"
- "5.14"
- "5.12"
- "5.10"
install:
/bin/true
script:
./test.sh
# Flame Graphs visualize profiled code
Main Website: http://www.brendangregg.com/flamegraphs.html
Example (click to zoom):
[![Example](http://www.brendangregg.com/FlameGraphs/cpu-bash-flamegraph.svg)](http://www.brendangregg.com/FlameGraphs/cpu-bash-flamegraph.svg)
Other sites:
- The Flame Graph article in ACMQ and CACM: http://queue.acm.org/detail.cfm?id=2927301 http://cacm.acm.org/magazines/2016/6/202665-the-flame-graph/abstract
- CPU profiling using Linux perf\_events, DTrace, SystemTap, or ktap: http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
- CPU profiling using XCode Instruments: http://schani.wordpress.com/2012/11/16/flame-graphs-for-instruments/
- CPU profiling using Xperf.exe: http://randomascii.wordpress.com/2013/03/26/summarizing-xperf-cpu-usage-with-flame-graphs/
- Memory profiling: http://www.brendangregg.com/FlameGraphs/memoryflamegraphs.html
- Other examples, updates, and news: http://www.brendangregg.com/flamegraphs.html#Updates
Flame graphs can be created in three steps:
1. Capture stacks
2. Fold stacks
3. flamegraph.pl
1\. Capture stacks
=================
Stack samples can be captured using Linux perf\_events, FreeBSD pmcstat (hwpmc), DTrace, SystemTap, and many other profilers. See the stackcollapse-\* converters.
### Linux perf\_events
Using Linux perf\_events (aka "perf") to capture 60 seconds of 99 Hertz stack samples, both user- and kernel-level stacks, all processes:
```
# perf record -F 99 -a -g -- sleep 60
# perf script > out.perf
```
Now only capturing PID 181:
```
# perf record -F 99 -p 181 -g -- sleep 60
# perf script > out.perf
```
### DTrace
Using DTrace to capture 60 seconds of kernel stacks at 997 Hertz:
```
# dtrace -x stackframes=100 -n 'profile-997 /arg0/ { @[stack()] = count(); } tick-60s { exit(0); }' -o out.kern_stacks
```
Using DTrace to capture 60 seconds of user-level stacks for PID 12345 at 97 Hertz:
```
# dtrace -x ustackframes=100 -n 'profile-97 /pid == 12345 && arg1/ { @[ustack()] = count(); } tick-60s { exit(0); }' -o out.user_stacks
```
60 seconds of user-level stacks, including time spent in-kernel, for PID 12345 at 97 Hertz:
```
# dtrace -x ustackframes=100 -n 'profile-97 /pid == 12345/ { @[ustack()] = count(); } tick-60s { exit(0); }' -o out.user_stacks
```
Switch `ustack()` for `jstack()` if the application has a ustack helper to include translated frames (eg, node.js frames; see: http://dtrace.org/blogs/dap/2012/01/05/where-does-your-node-program-spend-its-time/). The rate for user-level stack collection is deliberately slower than kernel, which is especially important when using `jstack()` as it performs additional work to translate frames.
2\. Fold stacks
==============
Use the stackcollapse programs to fold stack samples into single lines. The programs provided are:
- `stackcollapse.pl`: for DTrace stacks
- `stackcollapse-perf.pl`: for Linux perf_events "perf script" output
- `stackcollapse-pmc.pl`: for FreeBSD pmcstat -G stacks
- `stackcollapse-stap.pl`: for SystemTap stacks
- `stackcollapse-instruments.pl`: for XCode Instruments
- `stackcollapse-vtune.pl`: for Intel VTune profiles
- `stackcollapse-ljp.awk`: for Lightweight Java Profiler
- `stackcollapse-jstack.pl`: for Java jstack(1) output
- `stackcollapse-gdb.pl`: for gdb(1) stacks
- `stackcollapse-go.pl`: for Golang pprof stacks
- `stackcollapse-vsprof.pl`: for Microsoft Visual Studio profiles
Usage example:
```
For perf_events:
$ ./stackcollapse-perf.pl out.perf > out.folded
For DTrace:
$ ./stackcollapse.pl out.kern_stacks > out.kern_folded
```
The output looks like this:
```
unix`_sys_sysenter_post_swapgs 1401
unix`_sys_sysenter_post_swapgs;genunix`close 5
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf 85
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;c2audit`audit_closef 26
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;c2audit`audit_setf 5
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;genunix`audit_getstate 6
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;genunix`audit_unfalloc 2
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;genunix`closef 48
[...]
```
3\. flamegraph.pl
================
Use flamegraph.pl to render a SVG.
```
$ ./flamegraph.pl out.kern_folded > kernel.svg
```
An advantage of having the folded input file (and why this is separate to flamegraph.pl) is that you can use grep for functions of interest. Eg:
```
$ grep cpuid out.kern_folded | ./flamegraph.pl > cpuid.svg
```
Provided Examples
=================
### Linux perf\_events
An example output from Linux "perf script" is included, gzip'd, as example-perf-stacks.txt.gz. The resulting flame graph is example-perf.svg:
[![Example](http://www.brendangregg.com/FlameGraphs/example-perf.svg)](http://www.brendangregg.com/FlameGraphs/example-perf.svg)
You can create this using:
```
$ gunzip -c example-perf-stacks.txt.gz | ./stackcollapse-perf.pl --all | ./flamegraph.pl --color=java --hash > example-perf.svg
```
This shows my typical workflow: I'll gzip profiles on the target, then copy them to my laptop for analysis. Since I have hundreds of profiles, I leave them gzip'd!
Since this profile included Java, I used the flamegraph.pl --color=java palette. I've also used stackcollapse-perf.pl --all, which includes all annotations that help flamegraph.pl use separate colors for kernel and user level code. The resulting flame graph uses: green == Java, yellow == C++, red == user-mode native, orange == kernel.
This profile was from an analysis of vert.x performance. The benchmark client, wrk, is also visible in the flame graph.
### DTrace
An example output from DTrace is also included, example-dtrace-stacks.txt, and the resulting flame graph, example-dtrace.svg:
[![Example](http://www.brendangregg.com/FlameGraphs/example-dtrace.svg)](http://www.brendangregg.com/FlameGraphs/example-dtrace.svg)
You can generate this using:
```
$ ./stackcollapse.pl example-stacks.txt | ./flamegraph.pl > example.svg
```
This was from a particular performance investigation: the Flame Graph identified that CPU time was spent in the lofs module, and quantified that time.
Options
=======
See the USAGE message (--help) for options:
USAGE: ./flamegraph.pl [options] infile > outfile.svg
--title TEXT # change title text
--subtitle TEXT # second level title (optional)
--width NUM # width of image (default 1200)
--height NUM # height of each frame (default 16)
--minwidth NUM # omit smaller functions (default 0.1 pixels)
--fonttype FONT # font type (default "Verdana")
--fontsize NUM # font size (default 12)
--countname TEXT # count type label (default "samples")
--nametype TEXT # name type label (default "Function:")
--colors PALETTE # set color palette. choices are: hot (default), mem,
# io, wakeup, chain, java, js, perl, red, green, blue,
# aqua, yellow, purple, orange
--bgcolors COLOR # set background colors. gradient choices are yellow
# (default), blue, green, grey; flat colors use "#rrggbb"
--hash # colors are keyed by function name hash
--cp # use consistent palette (palette.map)
--reverse # generate stack-reversed flame graph
--inverted # icicle graph
--flamechart # produce a flame chart (sort by time, do not merge stacks)
--negate # switch differential hues (blue<->red)
--notes TEXT # add notes comment in SVG (for debugging)
--help # this message
eg,
./flamegraph.pl --title="Flame Graph: malloc()" trace.txt > graph.svg
As suggested in the example, flame graphs can process traces of any event,
such as malloc()s, provided stack traces are gathered.
Consistent Palette
==================
If you use the `--cp` option, it will use the $colors selection and randomly
generate the palette like normal. Any future flamegraphs created using the `--cp`
option will use the same palette map. Any new symbols from future flamegraphs
will have their colors randomly generated using the $colors selection.
If you don't like the palette, just delete the palette.map file.
This allows your to change your colorscheme between flamegraphs to make the
differences REALLY stand out.
Example:
Say we have 2 captures, one with a problem, and one when it was working
(whatever "it" is):
```
cat working.folded | ./flamegraph.pl --cp > working.svg
# this generates a palette.map, as per the normal random generated look.
cat broken.folded | ./flamegraph.pl --cp --colors mem > broken.svg
# this svg will use the same palette.map for the same events, but a very
# different colorscheme for any new events.
```
Take a look at the demo directory for an example:
palette-example-working.svg
palette-example-broken.svg
#!/usr/bin/perl
use Getopt::Std;
getopt('urt');
unless ($opt_r && $opt_t){
print "Usage: $0 [ -u user] -r sample_count -t sleep_time\n";
exit(0);
}
my $i;
my @proc = "";
for ($i = 0; $i < $opt_r ; $i++){
if ($opt_u){
$proc = `/usr/sysv/bin/ps -u $opt_u `;
$proc =~ s/^.*\n//;
$proc =~ s/\s*(\d+).*\n/\1 /g;
@proc = split(/\s+/,$proc);
} else {
opendir(my $dh, '/proc') || die "Cant't open /proc: $!";
@proc = grep { /^[\d]+$/ } readdir($dh);
closedir ($dh);
}
foreach my $pid (@proc){
my $command = "/usr/bin/procstack $pid";
print `$command 2>/dev/null`;
}
select(undef, undef, undef, $opt_t);
}
Flame Graph demos gathered and created for the talk "Blazing Performance with
Flame Graphs" at USENIX/LISA 2013.
These SVGs can not be seen on github directly; save them locally first (git
clone or download), then open them in a browser (file://...).
This diff is collapsed.
<?xml version="1.0" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg version="1.1" width="1200" height="390" onload="init(evt)" viewBox="0 0 1200 390" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<defs >
<linearGradient id="background" y1="0" y2="1" x1="0" x2="0" >
<stop stop-color="#eeeeee" offset="5%" />
<stop stop-color="#eeeeb0" offset="95%" />
</linearGradient>
</defs>
<style type="text/css">
.func_g:hover { stroke:black; stroke-width:0.5; }
</style>
<script type="text/ecmascript">
<![CDATA[
var details;
function init(evt) { details = document.getElementById("details").firstChild; }
function s(info) { details.nodeValue = "Function: " + info; }
function c() { details.nodeValue = ' '; }
]]>
</script>
<rect x="0.0" y="0" width="1200.0" height="390.0" fill="url(#background)" />
<text text-anchor="middle" x="600" y="40" font-size="25" font-family="Verdana" fill="rgb(0,0,0)" >Flame Graph</text>
<text text-anchor="" x="10" y="365" font-size="20" font-family="Verdana" fill="rgb(0,0,0)" id="details" > </text>
<g class="func_g" onmouseover="s('libc.so.1`_fwrite_unlocked (17 samples, 0.04%)')" onmouseout="c()">
<title>libc.so.1`_fwrite_unlocked (17 samples, 0.04%)</title><rect x="1188.1" y="107" width="0.4" height="25.0" fill="rgb(216,95,1)" rx="2" ry="2" />
</g>
<g class="func_g" onmouseover="s('libc.so.1`memcpy (13 samples, 0.03%)')" onmouseout="c()">
<title>libc.so.1`memcpy (13 samples, 0.03%)</title><rect x="1189.0" y="107" width="0.3" height="25.0" fill="rgb(232,106,54)" rx="2" ry="2" />
</g>
<g class="func_g" onmouseover="s('libc.so.1`_UTF8_mbrtowc (12,492 samples, 25.80%)')" onmouseout="c()">
<title>libc.so.1`_UTF8_mbrtowc (12,492 samples, 25.80%)</title><rect x="754.5" y="107" width="304.4" height="25.0" fill="rgb(246,93,50)" rx="2" ry="2" />
<text text-anchor="" x="757.452431114967" y="125.5" font-size="20" font-family="Verdana" fill="rgb(0,0,0)" >libc.so.1`_UTF8_mbrtowc</text>
</g>
<g class="func_g" onmouseover="s('libc.so.1`_UTF8_mbrtowc (6,203 samples, 12.81%)')" onmouseout="c()">
<title>libc.so.1`_UTF8_mbrtowc (6,203 samples, 12.81%)</title><rect x="496.6" y="133" width="151.2" height="25.0" fill="rgb(253,112,8)" rx="2" ry="2" />
<text text-anchor="" x="499.609245259636" y="151.5" font-size="20" font-family="Verdana" fill="rgb(0,0,0)" >libc.so.1`..</text>
</g>
<g class="func_g" onmouseover="s('grep`check_multibyte_string (39,754 samples, 82.11%)')" onmouseout="c()">
<title>grep`check_multibyte_string (39,754 samples, 82.11%)</title><rect x="90.0" y="159" width="968.9" height="25.0" fill="rgb(240,175,19)" rx="2" ry="2" />
<text text-anchor="" x="92.9925641343413" y="177.5" font-size="20" font-family="Verdana" fill="rgb(0,0,0)" >grep`check_multibyte_string</text>
</g>
<g class="func_g" onmouseover="s('libc.so.1`memchr (16 samples, 0.03%)')" onmouseout="c()">
<title>libc.so.1`memchr (16 samples, 0.03%)</title><rect x="1189.6" y="185" width="0.4" height="25.0" fill="rgb(207,120,19)" rx="2" ry="2" />
</g>
<g class="func_g" onmouseover="s('libc.so.1`cancel_safe_mutex_unlock (14 samples, 0.03%)')" onmouseout="c()">
<title>libc.so.1`cancel_safe_mutex_unlock (14 samples, 0.03%)</title><rect x="1188.6" y="107" width="0.4" height="25.0" fill="rgb(237,117,40)" rx="2" ry="2" />
</g>
<g class="func_g" onmouseover="s('grep`EGexecute (48,285 samples, 99.73%)')" onmouseout="c()">
<title>grep`EGexecute (48,285 samples, 99.73%)</title><rect x="10.2" y="185" width="1176.9" height="25.0" fill="rgb(216,167,19)" rx="2" ry="2" />
<text text-anchor="" x="13.2193580369315" y="203.5" font-size="20" font-family="Verdana" fill="rgb(0,0,0)" >grep`EGexecute</text>
</g>
<g class="func_g" onmouseover="s('grep`xmalloc (20 samples, 0.04%)')" onmouseout="c()">
<title>grep`xmalloc (20 samples, 0.04%)</title><rect x="496.1" y="133" width="0.5" height="25.0" fill="rgb(228,71,39)" rx="2" ry="2" />
</g>
<g class="func_g" onmouseover="s('grep`_start (48,414 samples, 100.00%)')" onmouseout="c()">
<title>grep`_start (48,414 samples, 100.00%)</title><rect x="10.0" y="289" width="1180.0" height="25.0" fill="rgb(214,93,6)" rx="2" ry="2" />
<text text-anchor="" x="13" y="307.5" font-size="20" font-family="Verdana" fill="rgb(0,0,0)" >grep`_start</text>
</g>
<g class="func_g" onmouseover="s('grep`main (48,414 samples, 100.00%)')" onmouseout="c()">
<title>grep`main (48,414 samples, 100.00%)</title><rect x="10.0" y="263" width="1180.0" height="25.0" fill="rgb(209,14,25)" rx="2" ry="2" />
<text text-anchor="" x="13" y="281.5" font-size="20" font-family="Verdana" fill="rgb(0,0,0)" >grep`main</text>
</g>
<g class="func_g" onmouseover="s('libc.so.1`mbrtowc (5,022 samples, 10.37%)')" onmouseout="c()">
<title>libc.so.1`mbrtowc (5,022 samples, 10.37%)</title><rect x="1060.4" y="159" width="122.4" height="25.0" fill="rgb(230,64,35)" rx="2" ry="2" />
<text text-anchor="" x="1063.38377328872" y="177.5" font-size="20" font-family="Verdana" fill="rgb(0,0,0)" >libc.so...</text>
</g>
<g class="func_g" onmouseover="s('libc.so.1`memset (172 samples, 0.36%)')" onmouseout="c()">
<title>libc.so.1`memset (172 samples, 0.36%)</title><rect x="1182.8" y="159" width="4.2" height="25.0" fill="rgb(248,62,12)" rx="2" ry="2" />
</g>
<g class="func_g" onmouseover="s('libc.so.1`mutex_lock_impl (14 samples, 0.03%)')" onmouseout="c()">
<title>libc.so.1`mutex_lock_impl (14 samples, 0.03%)</title><rect x="1059.6" y="107" width="0.3" height="25.0" fill="rgb(212,104,50)" rx="2" ry="2" />
</g>
<g class="func_g" onmouseover="s('libc.so.1`mutex_lock (14 samples, 0.03%)')" onmouseout="c()">
<title>libc.so.1`mutex_lock (14 samples, 0.03%)</title><rect x="1059.6" y="133" width="0.3" height="25.0" fill="rgb(247,174,51)" rx="2" ry="2" />
</g>
<g class="func_g" onmouseover="s('libc.so.1`mutex_unlock (14 samples, 0.03%)')" onmouseout="c()">
<title>libc.so.1`mutex_unlock (14 samples, 0.03%)</title><rect x="1060.0" y="133" width="0.3" height="25.0" fill="rgb(205,164,17)" rx="2" ry="2" />
</g>
<g class="func_g" onmouseover="s('libc.so.1`_malloc_unlocked (7 samples, 0.01%)')" onmouseout="c()">
<title>libc.so.1`_malloc_unlocked (7 samples, 0.01%)</title><rect x="496.2" y="81" width="0.2" height="25.0" fill="rgb(207,199,33)" rx="2" ry="2" />
</g>
<g class="func_g" onmouseover="s('libc.so.1`malloc (18 samples, 0.04%)')" onmouseout="c()">
<title>libc.so.1`malloc (18 samples, 0.04%)</title><rect x="496.2" y="107" width="0.4" height="25.0" fill="rgb(216,19,43)" rx="2" ry="2" />
</g>
<g class="func_g" onmouseover="s('libc.so.1`free (46 samples, 0.10%)')" onmouseout="c()">
<title>libc.so.1`free (46 samples, 0.10%)</title><rect x="1059.3" y="159" width="1.1" height="25.0" fill="rgb(240,100,43)" rx="2" ry="2" />
</g>
<g class="func_g" onmouseover="s('all (48,414 samples, 100%)')" onmouseout="c()">
<title>all (48,414 samples, 100%)</title><rect x="10.0" y="315" width="1180.0" height="25.0" fill="rgb(238,40,25)" rx="2" ry="2" />
<text text-anchor="" x="13" y="333.5" font-size="20" font-family="Verdana" fill="rgb(0,0,0)" ></text>
</g>
<g class="func_g" onmouseover="s('libc.so.1`cancel_active (8 samples, 0.02%)')" onmouseout="c()">
<title>libc.so.1`cancel_active (8 samples, 0.02%)</title><rect x="1188.7" y="81" width="0.2" height="25.0" fill="rgb(219,223,39)" rx="2" ry="2" />
</g>
<g class="func_g" onmouseover="s('libc.so.1`free (7 samples, 0.01%)')" onmouseout="c()">
<title>libc.so.1`free (7 samples, 0.01%)</title><rect x="1189.4" y="185" width="0.2" height="25.0" fill="rgb(228,8,23)" rx="2" ry="2" />
</g>
<g class="func_g" onmouseover="s('libc.so.1`mbrtowc (16,865 samples, 34.83%)')" onmouseout="c()">
<title>libc.so.1`mbrtowc (16,865 samples, 34.83%)</title><rect x="647.9" y="133" width="411.0" height="25.0" fill="rgb(243,167,13)" rx="2" ry="2" />
<text text-anchor="" x="650.868798281489" y="151.5" font-size="20" font-family="Verdana" fill="rgb(0,0,0)" >libc.so.1`mbrtowc</text>
</g>
<g class="func_g" onmouseover="s('grep`0x4067d0 (3,243 samples, 6.70%)')" onmouseout="c()">
<title>grep`0x4067d0 (3,243 samples, 6.70%)</title><rect x="11.0" y="159" width="79.0" height="25.0" fill="rgb(230,205,25)" rx="2" ry="2" />
<text text-anchor="" x="13.9505514933697" y="177.5" font-size="20" font-family="Verdana" fill="rgb(0,0,0)" >grep..</text>
</g>
<g class="func_g" onmouseover="s('libc.so.1`mutex_unlock (8 samples, 0.02%)')" onmouseout="c()">
<title>libc.so.1`mutex_unlock (8 samples, 0.02%)</title><rect x="496.4" y="81" width="0.2" height="25.0" fill="rgb(222,228,28)" rx="2" ry="2" />
</g>
<g class="func_g" onmouseover="s('grep`prtext (94 samples, 0.19%)')" onmouseout="c()">
<title>grep`prtext (94 samples, 0.19%)</title><rect x="1187.1" y="185" width="2.3" height="25.0" fill="rgb(232,213,40)" rx="2" ry="2" />
</g>
<g class="func_g" onmouseover="s('libc.so.1`fwrite (60 samples, 0.12%)')" onmouseout="c()">
<title>libc.so.1`fwrite (60 samples, 0.12%)</title><rect x="1187.8" y="133" width="1.5" height="25.0" fill="rgb(250,133,23)" rx="2" ry="2" />
</g>
<g class="func_g" onmouseover="s('grep`grepbuf (48,411 samples, 99.99%)')" onmouseout="c()">
<title>grep`grepbuf (48,411 samples, 99.99%)</title><rect x="10.0" y="211" width="1180.0" height="25.0" fill="rgb(242,170,46)" rx="2" ry="2" />
<text text-anchor="" x="13.0243731152146" y="229.5" font-size="20" font-family="Verdana" fill="rgb(0,0,0)" >grep`grepbuf</text>
</g>
<g class="func_g" onmouseover="s('grep`kwsexec (10 samples, 0.02%)')" onmouseout="c()">
<title>grep`kwsexec (10 samples, 0.02%)</title><rect x="1058.9" y="159" width="0.3" height="25.0" fill="rgb(215,202,8)" rx="2" ry="2" />
</g>
<g class="func_g" onmouseover="s('grep`grepfile (48,414 samples, 100.00%)')" onmouseout="c()">
<title>grep`grepfile (48,414 samples, 100.00%)</title><rect x="10.0" y="237" width="1180.0" height="25.0" fill="rgb(240,101,10)" rx="2" ry="2" />
<text text-anchor="" x="13" y="255.5" font-size="20" font-family="Verdana" fill="rgb(0,0,0)" >grep`grepfile</text>
</g>
<g class="func_g" onmouseover="s('grep`prline (76 samples, 0.16%)')" onmouseout="c()">
<title>grep`prline (76 samples, 0.16%)</title><rect x="1187.4" y="159" width="1.9" height="25.0" fill="rgb(219,226,8)" rx="2" ry="2" />
</g>
<g class="func_g" onmouseover="s('grep`print_line_head (5 samples, 0.01%)')" onmouseout="c()">
<title>grep`print_line_head (5 samples, 0.01%)</title><rect x="1187.6" y="133" width="0.1" height="25.0" fill="rgb(211,228,21)" rx="2" ry="2" />
</g>
<g class="func_g" onmouseover="s('libc.so.1`mutex_unlock_queue (6 samples, 0.01%)')" onmouseout="c()">
<title>libc.so.1`mutex_unlock_queue (6 samples, 0.01%)</title><rect x="496.5" y="55" width="0.1" height="25.0" fill="rgb(248,95,45)" rx="2" ry="2" />
</g>
</svg>
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.