How to Fix Disjointed Traces with Context Propagation

Connecting an OTel-Instrumented Service to a Service Instrumented with Datadog Tracing Libraries

Published in

Stories from the Herd by Tucows

6 min readDec 2, 2021

A graphic of nodes and lines connecting together to represent the “connecting the disjointed” theme of this article. The nodes and lines are in a bright blue green on a deep blue background.

As the world continues to shift from relying on monolithic applications to adopting microservices, Observability has become an important focus. At Tucows, the Observability team aims to develop best practices and standards around Observability. What is Observability? It is defined as understanding the inner workings of a system without having to dive deep into the actual code, allowing us to solve novel problems (“unknown unknowns”). There are many Observability tools available to choose from. To get data into these tools, you can either use a vendor-specific library, or you can use a vendor-neutral framework called OpenTelemetry.

OpenTelemetry (OTel for short) consists of APIs and SDKs which enable developers to generate, collect, transform, and export telemetry data. Telemetry data consists of three things: logs, traces and metrics.

Over the last few months, developers at Tucows significantly increased their adoption of OTel. With it, came some interesting tidbits of information & learning. Here’s our story…

Background

At Tucows, teams are responsible for different services. A developer on one of the teams we work with, Budi, ran into a problem where he was unable to see end-to-end tracing for an application that receives calls from another application. Budi’s team’s service was fully instrumented using OTel’s Python SDK; however, the service it was calling was instrumented using a vendor-specific tracing library. We worked closely going back and forth with Budi to try to get to the root of the problem. Various attempts were made by trying to configure the docker compose along with the OTel Collector configuration. The problem was that even though the two services were clearly talking to each other, the Observability tool was treating the interactions between the two services as two separate traces! We needed a way to connect these two services despite the fact they were instrumented differently. The solution was to use Context Propagation.

Before we continue, it is assumed that you are familiar with the following topics and concepts:

Observability
OTel
Instrumenting code using OTel API & SDK

Flame graph of second service — An example of disjoint tracing for an application

Context Propagation

Cartoon of a relay race — Context Propagation can be thought of as a relay race where each person (service) hands the baton (current context) to the next person (service).

What exactly IS Context Propagation? Events and metrics need context — they identify what goes with a transaction. The thing that ties different events and metrics is the Trace ID. Without this, it becomes very painful, as a lot of querying and filtering is required to tie this information together (as seen first hand by Budi). Having a common Trace ID is what helped bind the two services together and show a full end-to-end tracing of the team’s application. This allows it to be seen as a single trace in an Observability backend such as Datadog.

Within a process, you have a Context object, which is basically a dictionary — i.e. a bag of keys and values that follows along the path of execution of a request. To send a Context object over a network call, you need to serialize it and convert it to a set of headers. The client injects this Context object, while the server extracts it.

To implement context propagation, you need to decide which headers to use. The main header types include:

W3C Trace Context (the default in OTel)
Zipkin B3
Custom/Proprietary (i.e. vendor-specific)

Had both applications been fully instrumented from scratch using OTel libraries, then W3C Trace Context would have been the way to go, and Budi wouldn’t have run into this problem; however, one of the applications was instrumented using OTel, while the other was using a vendor-specific tracing library which supports only the B3/Zipkin trace context. Since OTel also supports B3 headers, we decided to craft our solution using B3 headers.

B3 propagation refers to context propagation using B3 Headers. B3 Headers are essentially just headers that start with x-b3. By using a common header for both applications, we would be able to move the common TraceId from one application to another.

Diagram of B3 Propagation — A diagram to illustrate context propagation using B3 Headers

B3 Context Propagation Example

Before we presented our solution to Budi, we decided to test out Context propagation using an example. We created two services: a client written in Golang, and a server written in Python. To simulate the team’s issue, we decided to instrument one service using the vendor specific-library, and the other using OTel. We would then need to connect the two services using B3 Headers for Context propagation, and in theory be able to view the full end-to-end trace.

For the Golang service, we wanted it to emulate the issue Budi had by using the Datadog Tracing Library instead of OTel. In the main function, we would start the tracer and add attributes. When configuring attributes to follow best practices, you should include the name of the environment, service, and version number. We used a mux router, created a Span, and got the Context. We then injected the Span Context into the request headers. This was the most crucial part, as this Context needs to propagate to the Python script to get the full end-to-end trace.

Client script in Golang instrumented with Datadog

Line 20 in the above code will inject B3 Headers into the Context object, which will then be extracted by the server.

Server script in Python instrumented with OTel

For the Python service, we fully instrumented the Flask application with OpenTelemetry Python SDK and initiated tracer and span.

Line 6 extracts the Context from the header variables received from client service and this allows your Observability back-end to correlate the spans and allows you to view an end-end trace of your distributed application despite being instrumented with different tracing systems.

Full end-end tracing of an application — An example of an end-end tracing for the application

Hiccup using B3 Propagation

While the example provided above worked just fine, when Budi went to implement this solution, he had to configure a few extra details. It was discovered that Span ID and Trace ID that were generated using Datadog were in decimal (base 10). OTel by default generates W3C Trace headers which are in hexadecimal (base 16). When these headers were being converted to B3, the difference in number systems was not taken into account. Budi continued to see the disjointed traces even after using B3 propagation. He then had to go into his application and write a function to convert the Datadog headers to hexadecimal values to ensure that context propagation would be successful.

Example of W3C Trace Header — OTel headers use hexadecimal values

Example of a Datadog Header — Datadog headers use decimal values

Conclusion

When two services are showing disjointed tracing, use context propagation to ensure that the two separate traces are seen as one. Issues may arise when instrumenting different services as there are no common headers. Using B3 Headers allowed us to correlate two services that were instrumented using different tracing libraries (Datadog, and OTel). It is important to note that your goal should be to instrument all microservices using OTel to avoid this problem; however, if you are migrating from vendor-specific libraries to OTel, this helps ensure that you preserve your trace context. Stay tuned for future posts, as we continue our exploration into OTel.

A huge shoutout to Budi Prasetya who worked closely with us on this problem and shared his findings on the headers.

Stories from the Herd by Tucows

How to Fix Disjointed Traces with Context Propagation

Connecting an OTel-Instrumented Service to a Service Instrumented with Datadog Tracing Libraries

Context Propagation

B3 Context Propagation Example

Hiccup using B3 Propagation

Conclusion

References and Resources

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in Stories from the Herd by Tucows

Written by Mohammad Harun

No responses yet