Notes Welcome to Linux Plumbers Conference 2014. Anyone can join: https://etherpad.fr/p/LPC2014_Tracing Topic: Correlating timestamps in user and kernel space Speaker: Pawel Moll problem: (read subject) ftrace can switch time source at any time LTTng can correlate user and kernel. Mathieu tells us how... using clock monotonic, but can not trace from NMI context. Talked with John Stultz, Peter Zijlstra, and Thomas Gleixner, and now we can use a clock monotonic in NMI coming in 3.17. Slight chance that the clock can go backwards for a few cycles. Caller can deal with it, LTTng has a wrapper to handle this. Pawel thinks there was a way to inject a timestamp. Mathieu explains: Correlate between clock mono and clock RT. Several reads are done to find the best correlation. Does it in userspace too. Ingo doesn't want to use raw monotonic because of performance. Could use HPET in NMI context. monotonic will use what ever is being used, but if TSC is not available, then it falls back to whatever which can be horrible. tglx says clock monotonic raw is the worse thing you can use at all. It's a timestamp with no meaning Really should use clock monotonic. Slight chance of going backwards is non issue 3 use cases 1.some kind of slow-incoming userspace performance data (eg. energy consumption via hwmon or USB) 2. JIT engine, using perf, needs precise information. wants to inject time point. Knows whether something happened after another event. Doesn't need real time just the order of events. 3. Ingo suggested a trace marker for perf. user injected data. tglx wants all tracing timestamps correlated among all tracers. tglx will go to budapest to enforce monotonic timestamps in perf perf will be able to pass data from userspace to the perf kernel buffer. events can select if they wan to pick it up. No objections. Tools need the same way to inject data. LTTng has its own solution. Userspace does not go into the kernel to do tracing. ftrace trace_marker can write binary data. Only trace and trace_pipe reports a string. If you use trace_pipe_raw it will return the binary data as is. Masami Hiramatsu suggested making a trace_event that allows a user to write binary data into the buffer. We can have a header and then user defined data. Everyone agrees that we need this interface, but we are bickering about implementation. Correlating between hardware recording of time stamps with the tracer time stamps. Need to also correlate between virtual guests and hosts. No answer for this. perf will have a different sample type. Initial implementation to use clock_mono_raw. tglx suggests using clock_mono for the reference. Then other hw or network can use an offset from the time stamp. Wants to add the timestamp to any perf event. Tglx not happy about it (afrer a though he might have changed his mind). Should be a generic implementation in kernel to ask to get time stamp from any clock of your choice periodically. perf can do this for us with an additional PERF_SAMPLE_CLOCK_OF_YOUR_CHOICE field. Selected by additional flag in perf_event_attr with 0 being default (MONOTONIC) Topic 2: Linux Tracing Strategy Speaker: Brendan Gregg Talking about ftrace and ktap. 60% of issues can be solved with ftrace, and more with perf. Some SystemTap usage. Would use LTTng, but there's a lot of tracers. There's some light weight inquiries about the kernel that the tracers don't answer. Eg, in-kernel histogram of function latency, or a frequency count of a function argument. These allow a reactive early investigation, narrowing down the events needed to log in the traditional way. Believes that eBPF can answer these. Jovi from ktap: can't be dependent on gcc for embedded systems and such. Alexei from eBPF: Compiler is not a must have to work. Code can be generated in many different ways. Doesn't like to learn new languages. Likes C because you do not need to learn a new language, but any language can be used. Take the ktap scripts and write a tool to convert to eBPF instructions. Jovi needs to write a compiler to do the conversion. For embedded devices the compiler to convert to eBPF does not need to be complex. Can also compile on another machine (Like a host) and then move to another machine to run. Alexei wants to go to distributive dynamic tracing. Run tracing on lots of machines (in the cloud). Trace network analysis. Mathieu wants the same scripts to be able to be run on the host or run on distributed hosts. Masami wants to process the buffer of the trace_event (binary event). Should not be an issue because the userspace has access to the data structure and creates the eBPF of the system. Need a way to filter on entering of event not after the event writes to the buffer. But we can't show that information to userspace because it creates a new ABI that we wont be able to change in the future. State of eBPF by Alexei: moving nicely into the kernel. State of ktap by Jovi: suspended. But may continue with the new compiler info. State of LTTng by Mathieu: in all mainline and embedded distros. Ingo wanted everything in perf, but LTTng does not find that useful. LTTng has some advantages being out of tree. Needs to match many versions. State of SystemTap by Mark: more focused on moving out of the kernel. Use to need kernel, but now has less dependencies. State of ftrace by Steve: working on clone machine for Steve. State of Oracle DTrace by Elena: posted on oracle's site. Also has CTF. Can't fix the license. Still in development, and another release will be soon. State of perf by Jiri: trying to interact with other tracers with CTF. Topic: CTF Speaker: Mathieu Desnoyers Common Trace Format: Open specification Need mapping between addresses and symbols in CTF. But CTF also describes userspace traces. Addresses can map to different processes virtual memory. Does not want to tie the data. Summarize debug info. Not wanting to pull all GB of debug info. What is needed? Map instruction pointers to symbols. For dynamic tracing needs to know what registers are. What variable is where? Basic use case is the symbol table. Both data and function. Some distros destribute the compressed version. Only functions, not unwinding. Need to extend the language for mapping. Perhaps tag the address to differentiate the address and the process. Tag can be a reference to another field to keep it more flexible. May need to hook tracer in JIT engine. Also need to be aware of self modifying code like kprobes. Question asked: Can we describe a core dump? Need to look into it. Converter uses the Babeltrace plugin. Could have an input plugin that translate live the data into CTF. Maybe could create a CTF format to express the unmodified data of the tracers for ftrace and perf. Topic: Sharing kernel tools Speaker: Jiri Olsa Others want to use perf.data, tools/perf -> tools/lib Move tools from trace-cmd to perf in tools/lib/traceevent. Others would like to parse this data too. In kernel library, shared as static library or source code. Industry tool using perf and doing the parsing itself. Best to have distribution independent way to ship. Tool that can generate the header and C files from perf to integrate into your own project. Why is it hard not to do version libraries. trace-cmd wrote the event parsing code as a library, but was never pushed due to lack of experience in having a stable library that distros can pull. Need to get this done in such a way that we can not have it stopped due to political reasons. Topic: Rich probe filtering and reporting with variable locations and types Speaker: Mark Wielaard systemtap has a little script, code, and uses kernel module to connect. Works well, but not everyone likes it. Needs a way to talk to kernel, uses a module now, but could probably replace that with eBPF. Requires debug info such that you write scripts against source and can trace inlined functions. Masami stated that he added that with perf probes. Current state of eBPF is that it is in the kernel but not attached to anything (not useful yet). Can send you a patch if you want to use it. Alexei has a git tree that makes eBPF useful to try. git://git.kernel.org/pub/scm/linux/kernel/git/ast/bpf.git Asking dtrace people: Is CTF (Compact Type Format, do not confuse with previous talk about CTF - Common Trace Format). This CTF is embedded in kernel. Solaris and dtrace use this. Perhaps we can add this to mainline kernel too? elena: there is a GIT tree with CTF here: https://oss.oracle.com/git/ libdtrace-ctf.git CTF is a compact way to store dwarf info. Very limited, storing function entries, and some symbols. Can't probe inline functions. Masami is trying to include inline functions for probes. Stores function entries, arguments. DWARF describes everything you need. But it's way too big. CTF can be very useful. Brendan says systemtap needs to divorce itself from full debug info. Replaces too many kernels at netflix to create it. Systemtap has a way to not use full dwarf but it's not tested well and has lots of bugs. It should have a way to use the smaller versions of debug info. Systemtap needs to get involved in the conversations with the kernel developers to have an influence with what ends up in mainline. Mark was talking with Alexei about details of using eBPF. Verifier of eBPF restricts you from shooting yourself in the foot. If you get rid of the verifier then you can do anything. Do not let the eBPF take over the box. Alexei: eBPF cannot alone support all the features of SystemTap has. probe_kernel_read() not safe enough. STRIKE ANNOUNCEMENT: http://www.bahn.de/blitz/view/fernverkehr/uebersicht.stml Search for train changes: www.bahn.de/liveauskunft