OpenTelemetry is a powerful tool that’s revolutionizing observability. Today, we dive into this emerging open-source standard and explore its growing role in tech.
Our special guest is Adriana Villela, a Senior Tech Leader with over 20 years of experience in software development, DevOps, and site reliability engineering (SRE). Adriana is a leader in the OpenTelemetry community as a maintainer of the OTEL End User SIG and a Senior Staff Developer Advocate. She’s also a regular conference speaker, blogger, and host of the Geeking Out podcast, where she explores all things tech with exciting guests.
In this episode, Adriana and Arin dive into the world of observability, exploring its crucial role in software applications and DevOps. They provide an in-depth analysis of OpenTelemetry, outline key components for creating reliable systems, and address the common challenges encountered during its adoption.
If you’re curious about how to stay on top of your systems and catch issues before they snowball, this episode is for you!
Listen on Spotify
Listen on Apple Podcasts
Watch the video:
Key Insights with links to jump ahead are below
Resources mentioned:
Geeking Out Podcast: Kelsey Hightower
Geeking Out Podcast: Charity Majors
Geeking Out Podcast: Abby Bangser
Geeking Out Podcast: Liz Fong-Jones
About Guest:
Name: Adriana Villela
What she does: She’s a Senior Tech Leader and a Podcast Host.
Company: Lightstep
Where to find Adriana: LinkedIn | Social links
Key Insights
⚡How to create psychological safety in a team? Sometimes, the best support you can offer to your team is to avoid adding to their existing stress. Adriana explains, “I think, as a manager, just making sure that you protect your team from various people breathing down on your team members necks because how many incidents have we heard of where you’ve got the poor folks trying to resolve the incident, and then there’s all these execs jumping on whatever call and then they’re like, ‘I used to be a developer, and perhaps you could report database.’ Come on, buddy, just let the professionals do their job. So I think giving that level of protection from management is extremely important.”
⚡Is OpenTelemetry easy to adopt? OpenTelemetry is a strong choice for ensuring reliable systems through standardized observability. But there are important considerations to keep in mind. Adriana says, “It feels like a no-brainer to me to adopt OpenTelemetry because, nowadays, if you want to have reliable systems, you cannot ignore observability. And what better way to make sure that your system is observable than to instrument your code using a standard, like OpenTelemetry, where you have standardized APIs and SDKs that provide that for you? I guess the challenge is when you’re asking organizations to instrument their code, and they haven’t instrumented their code before, you are basically introducing technical debt, new technical debt into the organization, especially if they have a well-established application. So you are basically saying, ‘Hey, we need to do this, but we’re introducing new technical debt,’ and that can be a really tough pill to swallow.”
⚡AI and ML are emerging as valuable tools in observability. AI is making significant strides across various fields, including software applications. The key question is whether AI can truly replace human expertise. Adriana shares her views on AI in observability. She says, “I think I see it as an assistive technology. I don’t think anything’s really going to replace the human. But I can maybe spot some trends that we might not necessarily notice. And so having that in our tooling and say, ‘Hey, Adriana, take a look at this, this might look fishy.’ And you’d be like, ‘Oh, okay. Yeah, that might be worth digging into a little bit more.’ Or you can say, ‘No, you’re full of crap.’ But either way, it might surface stuff that you might not necessarily be aware of.”
Episode Highlights
The importance of psychological safety in tech teams
Creating reliable systems involves more than just technology. It also requires a focus on building a healthy work environment. That means engineering leaders need to prioritize psychological safety within their teams to get the best out of them.
Adriana explains, “It’s creating a safe space for work because if you’re working on a team where people aren’t feeling that psychological safety, to be able to fail, to be able to tell their boss ‘Oh, I made this mistake’, it’s going to just create this horrible culture of mistrust and then there’s they’re going to be anxious and then no one’s going to do a good job.”
Why is OpenTelemetry important?
OpenTelemetry is a CNCF project that provides a set of tools for observability by standardizing how data is collected and sent to systems. This standardization makes it easier to monitor and understand system performance across different platforms.
Adriana explains, “The idea is back in the day, every tool out there had their own set of libraries. There was no standardization. So, if you wanted to switch to a different vendor, you basically had to re-instrument your code. So basically, add the code to your code that emits those signals, those traces, metrics, and logs so that you could send it out to that vendor tool. But with the advent of OpenTelemetry, various observability vendors came together and said, ‘You know what? Let’s standardize this thing,’ so that what’s distinguishing us isn’t the data that we’re sending. We’re all receiving the same data. So what distinguishes one vendor from another is how they interpret the data, what are they doing with the data in a way that you, SRE, can look at stuff and say, ‘Okay, I have an inkling as to what’s going on’ and allows them to troubleshoot. And I think that’s the thing that’s exciting about OpenTelemetry.”
What’s the most common oversight in implementing observability?
Implementing observability is essential for system reliability, but teams sometimes overlook vital aspects. Arin asked Adriana about the most common oversight software teams make when integrating observability into their applications. Here’s what she said:
“So, actually, the biggest one is getting into the habit of instrumenting your code as you write it because we’ve got into the habit now of writing tests as we code, which is awesome. And even when we write our code, we’ll slap in log statements. We do that for debugging. We do that so that when things go caca, we know what’s going on. So when you’re instrumenting your code using something like OpenTelemetry, that’s basically the same as you’re adding logs using OpenTelemetry. So getting folks to get into that habit of doing that as they code.”