Ruby has introduced the TracePoint API with the release of Ruby 2.0. Earlier, there was a similar tool that let you track operations in any Ruby program called set_trace_func
. However, using it was really slow and ugly. This new TracePoint API aims to replace it.
What is “tracing”? Why is set_trace_func
slow? Why is TracePoint
better?
To understand how these two commands work, we need to understand how the compiler compiles a Ruby program and generates code that the VM then executes. The program that we will be using is:
Dissassembly
Let’s have a look at the output of the compiler for the Ruby program mentioned above. To get the disassembled output, the command to run is:
The output of the command is:
Note that the disassembled output is just a human-readable output.
The lines starting with a number represent a single VM instruction each. These instructions are implemented inside a file called insns.def
. The lines that start with == disasm
represent what are known as labels
. These labels are useful in identifying which Ruby method call or instruction the corresponding set of VM instructions pertain to. One stack-frame in Ruby corresponds to the set of instructions between — and including — a trace
and leave
. These two instructions are appended to ends of the stackframe by the compiler, and they perform certain specific functions.
Observe the trace
instructions here and the number after that instruction. This number is the decimal representation of a Ruby event. Erm…event, you say?
Events and trace
Ruby emits an event every time the VM’s state changes. The full list of events in Ruby (version 2.1.0-p0
) consists of 25 events and are defined inside of ruby.h
. However, the set of events that corresponding to the example Ruby program above are a subset of the following set:
1
2
3
4
5
6
7
8
9
10
11
12
/* traditional set_trace_func events */
#define RUBY_EVENT_NONE 0x0000
#define RUBY_EVENT_LINE 0x0001
#define RUBY_EVENT_CLASS 0x0002
#define RUBY_EVENT_END 0x0004
#define RUBY_EVENT_CALL 0x0008
#define RUBY_EVENT_RETURN 0x0010
#define RUBY_EVENT_C_CALL 0x0020
#define RUBY_EVENT_C_RETURN 0x0040
#define RUBY_EVENT_RAISE 0x0080
#define RUBY_EVENT_ALL 0x00ff
Each of the RUBY_EVENT_*
constants correspond to a hexadecimal number. Using this list, we can understand what the trace
instruction arguments in the disassembled output correspond to.
In the above example, there are a total of 8 trace
instructions. Let’s go through them one-by-one.
If you were thinking that the trace
instruction might be an event emitting routine, it’s kind of the reverse in reality. When ever the VM encounters a trace
event, all the event hooks that were defined for that event get executed. Open up the file insns.def
and locate the trace
instruction definition. There’s just one call — well, ignoring all the Dtrace-specific methods — which is, EXEC_EVENT_HOOK
.
All trace
instructions with an argument of 1
correspond to a RUBY_EVENT_LINE
event. This event is indicates that a new line of Ruby code has been encountered. All event hooks registered for the event RUBY_EVENT_LINE
are executed when the VM encounters this particular trace
event.
You can see that, in each of the labels, there is a minimum number of one RUBY_EVENT_LINE
event. An argument of 2 corresponds to the event RUBY_EVENT_CLASS
and an argument of 4 corresponds to the event RUBY_EVENT_END
. That means, inside the class:A
label, this is what happens:
- Event signalling a class entry.
- Event signalling a new ruby line that has been encountered.
- Event signalling that the
end
statement for the class definition has been reached.
Finally, in the last label (to_string
), the trace arguments of 8 and 16 correspond to RUBY_EVENT_CALL
and RUBY_EVENT_RETURN
.
leave
The leave
VM instruction simply pops the stack frame and is essential to start of execution of the next instruction. There will be only one leave
per instruction sequence but there can be multiple trace
instructions as we already saw in the example above.
Usage
The events emitted by the trace
instruction can be utilized by some tools like set_trace_func
and TracePoint
. Both set_trace_func
and TracePoint
API register event hooks that just call the Proc
that you pass in.
Examples of TracePoint and set_trace_func
usage can be seen at the RubyDoc website.
Here lies the reason why set_trace_func
is slow. As mentioned earlier, trace
instruction simply executes all the events that are registered. Both set_trace_func
and TracePoint
API register event hooks based on the configuration your provide to them. The usage of these two libraries can be found on the Rubydoc website. The way you use them differs slightly. In the case of TracePoint
, the event to listen to is specified upfront — that said, if no event is specified, the library adds a hook pertaining to all events — and in the case of set_trace_func
a proc is passed as an argument. This argument gets called every time a Ruby event occurs. That’s right, for every event, the argument gets called. In addition to that, the binding information when the event is instantiated is also computed and that slows down the tracing even more. In the case of TracePoint
though, you get access to a tracepoint object that can be used to fetch the data that’s required and it won’t compute the extra information unless it’s requested. This not only improves the API, it also improves the performance of tracing.
Disabling trace
The trace
instruction generation can be disabled by adding the following lines to the program and eval
-ing the code under question. For example, in the case of the program at the beginning, to disable trace
instruction, this does it:
The output of this program would be:
As you can see, this does not have the trace
instruction.
Summary
There are a lot of tooling support features that are being added to the latest Ruby. In fact, there are new TracePoint events added to Ruby 2.2.0 that were not present on 2.1.0. Tooling is a really important part for the growth and stability of a language. TracePoint’s C-API helps authors who write profilers like stackprof
, ruby-prof
to use a much simpler API and support more internal events which are accessible only via the C-API. And there has been a steady improvement on the quality and performance of the profilers which is always a good thing.