Erlang/OTP Profiling: Fprof

2
Erlang/OTP Profiling: Fprof

“Time and space are modes by which we think and not conditions in which we live.” — Albert Einstein.

Viacheslav Katsuba

Hi folks! Ukrainian Erlanger is here 🤘! In this new series of articles, I want to talk about Erlang/OTP profiling. In this article, in particular, I want to draw attention to fprof — a fantastic tool to profile programs to figure out where they spend their time. I hope the information I will provide here will help you better understand this tool and save you some time on future profiling!

So, let’s rock 🤘!

fprof © — A Time Profiling Tool using trace to file for minimal runtime performance impact — In other words, this tool provides a lot of useful information in readable format for quick diagnosis of performance issues.

But how can such a great tool help us with our actual products, and how can it be used? What are the pros and cons we can find when using it? Let’s try to figure that out!

fprof is focused on providing information about where the program time is spent, but at the same time, it can slow down the program it profiles. Of course, slowing down the program it profiles is not very good, but let’s not forget that collecting this information will help you and your product. Let’s not forget that such a check should occur in the development environment before the product is released. Nevertheless, it is also possible to profile the product itself in production. Still, the fact that in this way, your product will start to work slower in the area where you are trying to profile is inevitable. And this is, I guess, the main con.

It is probably worth mentioning that fprof is based on trace to file to minimize runtime performance impact. The results of this tool are provided per process to screen/file. Of course, the size of those results can be pretty significant. Because of this, you need to determine what exactly should be profiled and when carefully. Also providing records number of function calls, execution time, and Garbage Collection records. All that data will help us understand how many times each function has been called, how much time was spent on it, and what processes called those functions. Then, after stopping the tracing, fprof will erase all tracing in the node.

Regular execution of fprof is performed in three steps:

Step 1: Tracing

All tracing is stored in a file. The trace collects all function calls, process information, and garbage collection information. All that trace stuff, of course, has a timestamp.

Step 2: Profiling

When fprof wrote all traces into a file — all that data is read in the profiling step, and after simulation and data calculation, all that info is stored as a profile in the fprof server state. All data can be stored in a file or sent out to the console in this step.

Step 3: Analysing

All profile data sorted so far is filtered and written into a file or the console in this step. All that data is provided in a format that’s nice to the human eye, as well as Erlang parsing tools.

The API is pretty small and contains about 14 functions to manipulate analysis and trace data. To analyze and display the trace, it is enough to use the following API functions:

1> fprof:trace(start).                       

ok

2> spawn(fun() -> io:format("~p~n", [lists:seq(1, 100000)]) end).

<0.94.0>

3> fprof:trace(stop).

ok

4> fprof:profile().

Reading trace data...

..............................

End of trace!

ok

5> fprof:profile().

...

% CNT ACC OWN

[{ "<0.94.0>", 25123,undefined, 63.681}, %%

{ spawned_by, "<0.84.0>"},

{ spawned_as, {erlang,apply,["#Fun",[]]}},

{ initial_calls, [{erlang,apply,2},{erl_eval,'-expr/5-fun-3-',1}]}].

....

It is important to note that you don’t always need to use start/0 and stop/0 as you will see below in the current article. Other API functions can automatically execute them. Another critical point: the server will start if you try running the fprof:profile/* API. So, if you then try to stop the server using fprof:stop(), the terminal will be stuck. That’s why you need to call fprof:stop(kill). in those cases. See the examples below:

Bad 😟

1> fprof:profile(start).

{ok,<0.88.0>}

2> fprof:stop().
....

% stuck here

% because we need stop profile process instead of fprof server

Cool 😎

1> fprof:profile(start).

{ok,<0.85.0>}

2> fprof:profile(stop).

Empty trace!

ok

When you start a server using profile/* or other API, and you want to stop fprof server. You can also use this approach:

1> fprof:profile(start).

{ok,<0.88.0>}

2> Pid = whereis(fprof_server).

<0.84.0>

3> exit(Pid, kill).

Empty trace!

true

4> whereis(fprof_server).

undefined

You can also stop fprof with a specific reason:

1> {ok, Pid} = fprof:start().

{ok,<0.87.0>}

2> Ref = erlang:monitor(process, Pid).

#Ref<0.1758584341.3719036929.131305>

3> fprof:stop("die!!!").

ok

4> flush().

Shell got {'DOWN',#Ref<0.1758584341.3719036929.131305>,process,<0.87.0>,

"die!!!"}

ok

5> erlang:demonitor(Ref).

true

Or just kill it:

1> fprof:profile(start).

{ok,<0.101.0>}

2> Pid = whereis(fprof_server).

<0.100.0>

4> fprof:stop(kill). %% Equivalent to: exit(Pid, kill).

Empty trace!

5> whereis(fprof_server).

undefined

Notes: When the fprof server stops, all data collected on the profile will be lost!

Ok, let’s play with some code. So, we will take the same example for the foo.erl module as in Erlang/OTP official documentation…

-module(foo).

-export([create_file_slow/2]).
create_file_slow(Name, N) when is_integer(N), N >= 0 ->

{ok, FD} =

file:open(Name, [raw, write, delayed_write, binary]),

if N > 256 ->

ok =

file:write(FD,

lists:map(fun (X) -> <> end,

lists:seq(0, 255))),

ok = create_file_slow(FD, 256, N);

true ->

ok = create_file_slow(FD, 0, N)

end,

ok = file:close(FD).
create_file_slow(_FD, M, M) ->

ok;

create_file_slow(FD, M, N) ->

ok = file:write(FD, <>),

create_file_slow(FD, M+1, N).

As you can see, this module has one exported function with two arguments. The first argument expects some iodata() for the name of the test file. The next one expects a non_neg_integer() to control the number of recursive calls of the helper function (create_file_slow/3). This private function will write data into the test file. As I wrote above, you don’t always need to start fprof explicitly because some API functions already do it by default. One of them is fprof:apply/1:

1> fprof:apply(foo, create_file_slow, [junk, 1024]).

ok

2> fprof:profile().

Reading trace data...

..................

End of trace!

ok

3> fprof:analyse().

Processing data...

Creating output...

%% Analysis results:

{ analysis_options,

[{callers, true},

{sort, acc},

{totals, false},

{details, true}]}.
% CNT ACC OWN

[{ totals, 20986, 94.896, 70.237}]. %%%

% CNT ACC OWN

[{ "<0.80.0>", 20895,undefined, 69.193}]. %%

{[{undefined, 0, 97.578, 0.028}],

{{fprof,apply_start_stop,4}, 0, 97.578, 0.028}, %

...

...

...

% CNT ACC OWN

[{ "<0.84.0>", 210,undefined, 0.911}, %%

{ spawned_by, "<0.80.0>"},

{ spawned_as,

{proc_lib,init_p,

["<0.80.0>",[],gen,init_it,

[gen_statem,"<0.80.0>",self,raw_file_io_delayed,

{"<0.80.0>",

"#Ref<0.3454161812.343408643.86391>",

[delayed_write]}, []]]}},

{ initial_calls, [{proc_lib,init_p,5},{erlang,put,2}]}].

{[{undefined, 1, 92.878, 0.012}],

{ {proc_lib,init_p,5}, 1, 92.878, 0.012}, %

...

...

...

Done!

ok

Wait, wait, wait! What happened there 🙃?

Let’s go step by step…

1> fprof:apply(foo, create_file_slow, [junk, 1024]).

Here we call fprof:apply/3, which will only profile the current process and every other process spawned by it in the context of the evaluation of the function. By default it will call fprof:trace(start), then foo:create_file_slow(junk, 1024), and fprof:trace(stop). After running this API function, we will have a file called fprof.trace containing all the trace statements.

2> fprof:profile().

We need to call this API function to compile all trace data into profiling information and store it in memory.

3> fprof:analyse().

Finally, this function runs the analysis on that profile data and provides its outputs through the console. There you can see a large output… It’s divided into sections with comments (used as headers) before them.

The first part is %% Analysis results: It provides info about analysis options which (in our example) were set by default. You can also change these parameters using the API. It is important to note that when you are profiling your code or system, you need to remember the analysis parameters that you used to accurately compare analysis results in future research after code fixing/updating.

The next part is % CNT ACC OWN: It provides information divided into the following columns:

  • CNT — the number of times the function has been called
  • ACC — the time spent in the function, including called functions
  • OWN the time spent in the function not including called functions

It is worth noting that these columns are used in several blocks. The first one is Totals, where the values represent all function calls in all processes that got profiled. Then comes a block for each process that shows data only within the context of that process in particular. Each of these blocks starts with a process ID. It shows the parent’s ownership. In our example, the tuple spawned_by in PID “<0.84.0>” refers to the parent process “<0.80.0>”.

After the individual process parts, the following blocks show the chain of called modules and functions, with all the necessary information considering the garbage collector. It makes sense to pay attention to the division of these blocks in paragraphs using bullets %%%, %% and %. This is what each marker means:

  • %%% — total
  • %% — process
  • % — functions calls that are related to the process with %%

Using the analysis data, we can see that create_file_slow/3 is called 769 times and file:write/2 768 times. That’s likely indicating a waste of time in the execution of the desired functionality:

{[{{file,write,2},                        769,    35.357,  5.351}],     

{ {raw_file_io_delayed,write,2}, 769, 35.357, 5.351},

[{{raw_file_io_delayed,enqueue_write,2}, 769, 28.203, 5.805},

{{erlang,iolist_to_iovec,1}, 769, 1.751, 1.707},

{garbage_collect, 2, 0.052, 0.052}]}.

With all this information, we can try to optimize the code. The first thought that pops up for optimization is to minimize the number of calls to file:write/2. Let’s try with an accumulator to collect binary data so we can write it only once:

....

create_file_slow2(Name, N) when is_integer(N), N >= 0 ->

{ok, FD} =

file:open(Name, [raw, write, delayed_write, binary]),

Data = if N > 256 ->

ok =

file:write(FD,

lists:map(fun (X) -> <> end,

lists:seq(0, 255))),

create_file_slow2(256, N, <<>>);

true ->

create_file_slow2(0, N, <<>>)

end,

ok = file:write(FD, Data),

ok = file:close(FD).
create_file_slow2(M, M, Acc) ->

Acc;

create_file_slow2(M, N, Acc) ->

X = <>,

create_file_slow2(M+1, N, <>).

After the changes, we repeat our previous steps with our new function:

1> fprof:apply(foo, create_file_slow2, [junk2, 1024]).

ok

2> fprof:profile().

Reading trace data...

..................

End of trace!

ok

3> fprof:analyse().

We can already see an improvement in performance in the total section. Then, when we check the same section as before in the console output, we can see how much we were able to optimize the running time of the function:

{[{{file,write,2},                       2,    0.197,    0.017}],     

{ {raw_file_io_delayed,write,2}, 2, 0.197, 0.017},

[{{raw_file_io_delayed,enqueue_write,2},2, 0.118, 0.020},

{{erlang,iolist_to_iovec,1}, 2, 0.062, 0.020}]}.

But of course, scrolling up and down in console… It is not very convenient. It’s far easier to work with dump files where you can compare results and also use your custom functions if you need some filtering/postprocessing.

Hopefully, with this description, the console output no longer looks as frightening as it was at first glance. Even so, let’s see how we can consume dump files as it is not always convenient to read all the output in the console, especially if there is much more data than in our simple example. For this, you can specify a file name where you intend to place all this data. To exemplify this, it is worth changing our call example slightly:

1> fprof:apply(foo, create_file_slow, [junk, 1024]).

ok

2> fprof:profile().

Reading trace data...

..................

End of trace!

ok

3> fprof:analyse({dest, "fprof.analysis"}).

Processing data...

Creating output...

Done!

ok

At this time, in our root folder, we can find a new file called fprof.analysis. Since the data is stored in Erlang terms, we can parse it using file:consult/1:

1> {ok, Terms} = file:consult("fprof.analysis").

2> [Options, Totals | Rest] = Terms.

3> Options.

{analysis_options,[{callers, true},

{sort, acc},

{totals, false},

{details, true}]}

4> Totals.

[{totals,11734,38.829,38.509}]

What’s next? Well, quite a lot of information is missing in this article because this tool has many ways to manipulate and format its data. Unfortunately, if we try to describe them all at once, the size of the article will be too large and difficult to read. For this reason, I will leave two necessary resources below to get more information if you need it.

NOW WITH OVER +8500 USERS. people can Join Knowasiak for free. Sign up on Knowasiak.com
Read More

Leave a Reply