EXPAND ALL
  • Home
Open Source Docs

Dynamic Logging

This is a quick overview of Pixie's dynamic tracing capability which allows developers to add tracepoints without any instrumentation. To test this out yourself, check out the tutorial here.

Why is this needed?

Allow developers to save hours and days debugging code-level performance issues by giving the ability to dynamically add tracepoints in production code without any instrumentation.

Supported Use-cases

Function Input-Output Tracing

What are the arguments and return value of calls to Foo(x, y, z)?

Function Latency Profiling

What is the latency (call to return) of calls to Login(username)?

Capability Overview

Here's a quick overview video where we dynamically injecting tracepoints a function (link) within the "Online-Boutique" demo application's checkout service:

As reference, here's the PXL script used in the video:

import pxtrace
import px
# Change to the Pod you want to trace.
pod_name = 'online-boutique/checkoutservice'
# The name of the table where results will be gathered.
# Make sure to change this name if you change the data being collected in the pxtrace.probe below,
# otherwise there may be a column mis-match.
table_name = 'checkout_tracer_table'
# The following pxtrace.probe specifies the application code to be traced.
# func Sum(l, r pb.Money) (pb.Money, error)
@pxtrace.probe('github.com/GoogleCloudPlatform/microservices-demo/src/checkoutservice/money.Sum')
def probe_func():
return [{'lUnits': pxtrace.ArgExpr('l.Units')},
{'lNanos': pxtrace.ArgExpr('l.Nanos')},
{'rUnits': pxtrace.ArgExpr('r.Units')},
{'rNanos': pxtrace.ArgExpr('r.Nanos')},
{'retUnits': pxtrace.RetExpr('$0.Units')},
{'retNanos': pxtrace.RetExpr('$0.Nanos')}]
# This UpsertTracepoint deploys the dynamic tracepoint on the specified pod.
pxtrace.UpsertTracepoint('checkout_tracer',
table_name,
probe_func,
pxtrace.PodProcess(pod_name),
'10m')
# Query and output the results to screen.
df = px.DataFrame(table_name)
px.display(df)

Note that there is a known bug in which re-running the script after modifying the probe_func definition will cause the tracepoint to fail to deploy. To get around this bug, whenever you modify the probe_func definition, please rename the table_name (and update the table_name in the df = px.DataFrame(table_name) line as well.

FAQs

What compiled languages does it work for?

Currently it has been tested on Go with limited support for C++. Other compiled languages such as Rust, Haskell, etc. are well supported by our approach.

Does it work for Java? Or interpreted languages?

Our system does not currently work with interpreted or VM based languages. These languages usually have fairly sophisticated debug environments that we will integrate with in the future.

Does it work without Debug symbols?

We currently require Dwarf information to be present in the binary for it to work. We support optimized binaries (there are issues like inlined functions, that stirling does not yet fully support) but they need to contain the debug symbols. Future versions of Pixie will add support for remotely hosted symbol files. We are actively seeking feedback about how remote symbol files are used in practice, in order to design proper features.

Can we stream with sampling?

Dynamic tracepoints connect up to the Pixie platform. Native streaming support is core to Pixie and will be in the near future.

Is this extendable to general BPF probes?

We currently only support tracepoints that are generated by Pixie. We can leverage our approach to add support for this in the future if there is a significant demand for this feature.

How can we visualize the results?

Since Dynamic Tracepoints natively slot into Pixie they can leverage the platform's visualization environment. We will add support for views such as flame graphs in the future.

What are the different kinds of probes?

We currently support capturing function arguments, return values and latencies.

Can we mutate?

We don’t currently support any operators that will mutate the state of the application.

Can we call functions such as String()?

Not currently supported, but since this is such a useful feature we will explore adding it.

Can we deploy this outside of K8s?

Dynamic tracepoints don’t rely on any K8s specific features. They will be supported outside of K8s when Pixie can be installed there.

Can we have this managed using CRDs on K8s?

Tracepoints work on a declarative specification. Since Pixie is designed to work both inside and outside of K8s we don’t leverage CRDs to transmit the specification. In the future we might add support for providing specs from CRDs that are read into Pixie.

What is the performance overhead?

Minimal. A few tracepoints should have very little to no visible impact on non-trivial applications. Our studies on BPF probes have shown <1% overhead to capture full messages from a simple HTTP server. How often a tracepoint is triggered, and the amount of data being collected will affect this number.

What are security/privacy implications?

Since Dynamic Tracepoints can basically observe any function and its respective arguments there are significant privacy and security concerns. We will alleviate this by adding in RBAC support with the ability to have specific templates that are reviewed and allowed to be deployed. This feature can also leverage PII masking and other future enhancements to Pixie.

Difference between this and GDB, FTRACE, etc.?

Unlike most existing approaches we don’t actually stop execution of the program or mutate state. This allows us to easily capture data in production environments with limited overhead.

How do you turn off tracepoints?

Tracepoints have a TTL (time to live) when registered. This will allow automatic garbage collection of old tracepoints They can also be manually deleted.

Copyright © 2018- The Pixie Authors. All Rights Reserved.
This site uses cookies to provide you with a better user experience. By using Pixie, you consent to our use of cookies.