In this tutorial series, we will write a PxL script to analyze the volume of traffic coming in and out of each pod in your cluster (total bytes received vs total bytes sent). In this first tutorial, we will write a script to query a table of traced network connection data provided by Pixie's no-instrumentation monitoring platform.
1# Import Pixie's module for querying data2import px34# Load the last 30 seconds of Pixie's `conn_stats` table into a Dataframe.5df = px.DataFrame(table='conn_stats', start_time='-30s')67# Display the DataFrame with table formatting8px.display(df)
Every script begins witih importing Pixie's
px module. This is Pixie's main library for querying data.
5 we load the last
30 seconds of the data from the
conn_stats table into a DataFrame. The
conn_stats table contains high-level statistics about the connections (i.e. client-server pairs) that Pixie has traced in your cluster.
Finally, we display the table using
px live -f my_first_script.pxl
Your CLI should output something similar to the following table:
If your output table is empty, try increasing the value of the
start_time string on line
5. Save the script, exit the Live CLI using
ctrl+c, and re-run Step 3.
This script outputs a table of data representing the last 30 seconds of the traced client-server connections in your cluster. Columns include:
time_: Timestamp when the data record was collected.
upidAn opaque numeric ID that globally identifies a running process inside the cluster.
remote_addr: IP address of the remote endpoint.
remote_port: Port of the remote endpoint.
addr_family: The socket address family of the connection.
protocol: The protocol of the traffic on the connections.
role: The role of the process that owns the connection (client=1 or server=2).
conn_open: The number of connections opened since the beginning of tracing.
conn_close: The number of connections closed since the beginning of tracing.
conn_active: The number of active connections.
bytes_sent: The number of bytes sent to the remote endpoint(s).
bytes_recv: The number of bytes received from the remote endpoint(s).
You can find these column descriptions as well as descriptions for all of the data provided by Pixie by running the pre-built
px live px/schemas
table_namecolumn. You should see all of the columns available in the
conn_statstable listed with their descriptions.
DataFrame initialization supports
end_time for queries requiring more precise time periods. If an
end_time isn't provided, the DataFrame will return all events up to the current time.
1import px23df = px.DataFrame(table='conn_stats', start_time='-60s', end_time='-30s')45px.display(df)
Don't forget to save your script, exit the Live CLI using
ctrl+c, and re-run the script
px live -f ~/my_first_script.pxl to update the results.
You can drop columns using the
1import px23df = px.DataFrame(table='conn_stats', start_time='-30s')45# Drop select columns6df = df.drop(['conn_open', 'conn_close', 'bytes_sent', 'bytes_recv'])78px.display(df)
Alternatively, you can use keep to return a DataFrame with only the specified columns. This can be used to reorder the columns in the output.
1import px23df = px.DataFrame(table='conn_stats', start_time='-30s')45# Keep only the select columns6df = df[['remote_addr', 'conn_open', 'conn_close']]78px.display(df)
If you only need a few columns from a table, use the DataFrame's
select argument instead.
1import px23# Populate the DataFrame with only the select columns from the `conn_stats` table4df = px.DataFrame(table='conn_stats', select=['remote_addr', 'conn_open', 'conn_close'], start_time='-30s')56px.display(df)
To filter the rows in the DataFrame by the
1import px23df = px.DataFrame(table='conn_stats', start_time='-30s')45# Filter the results to only include rows whose `role` value equals 1 (connections traced on the client-side)6df = df[df.role == 1]78px.display(df)
If you want to see a small sample of data, you can limit the number of rows in the returned DataFrame to the first n rows (line 4).
1import px23df = px.DataFrame(table='conn_stats', start_time='-30s')45# Limit the number of rows in the DataFrame to 1006df = df.head(100)78px.display(df)
Congratulations, you built your first script!
In part 2 of this tutorial, we will expand this script to produce a table that summarizes the total amount of traffic coming in and out of each of the pods in your cluster.
This video summarizes the content in part 1 and part 2 of this tutorial: