Presenting: AI Unraveled — The Future of AI Safety Testing
We’re bringing you a recent release of AI Unraveled, a podcast exploring groundbreaking research, innovative applications and emerging technologies that are pushing the boundaries of AI.
On the episode, “The Future of AI Safety Testing,” Bret Kinsella, GM of Fuel iX™ at TELUS Digital, shared his expertise on game-changing approaches to AI safety testing and the critical importance of proactive security measures in enterprise AI deployment.
In conversation with host Etienne Noumen, Bret discusses the limitations of traditional red teaming methods for large language models, introduces the breakthrough technique that enables AI systems to test themselves and explores how enterprises in regulated industries can balance innovation with responsible AI governance.
Guests

GM of Fuel iX™ at TELUS Digital

Host of AI Unraveled
Episode topics
00:00 Introduction
02:17 What led Bret from Voicebot.ai to becoming GM of Fuel iX™ at TELUS Digital?
05:34 How can TELUS Digital’s AI safety research help everyday customers?
07:48 What are the fundamental flaws of traditional red teaming for large language models?
16:42 Can sophisticated prompting really be used to extract company secrets?
21:20 What is the OPRO technique and how does it allow AI to red team itself?
23:55 Why do you evaluate attack success rates as distributions rather than single metrics?
25:16 How should companies in high-stakes industries like finance and healthcare change their safety testing practices?
30:18 How do you navigate AI safety compliance in heavily regulated industries?
32:28 What's the next major frontier in AI safety testing?
34:48 What advice does Bret have for balancing AI innovation with responsible deployment?
38:32 Where can listeners learn more about Fuel iX™ and AI safety testing?
Transcript
[00:01] Robert Zirk: What happens when the AI systems powering your customer experiences become security vulnerabilities?
[00:08] It might sound like we're setting up a Questions for now episode, but instead we're sharing a recent episode from another series. AI Unraveled explores groundbreaking research, innovative applications and emerging technologies that are pushing the boundaries of AI.
[00:26] This particular episode features Bret Kinsella, general manager of Fuel iX™ at TELUS Digital, diving deep into AI safety testing and security — topics that might seem highly technical, but are increasingly critical for all leaders to understand. While this conversation gets into the weeds of AI red teaming and vulnerability testing, the implications for customer experience strategy are significant.
[00:52] Without further ado, here is that episode, titled "The Future of AI Safety Testing".
[00:59] Etienne Noumen: Today, we are going to talk about the future of AI safety testing, and to help us understand it, we have Bret Kinsella, the GM of Fuel iX™ at TELUS Digital. Welcome Bret.
[01:13] Bret Kinsella: Thank you very much Etienne. Really nice to be here today.
[01:16] Etienne Noumen: Okay. Today I want us to talk about four things. First, we're gonna start with your journey from Voicebot.ai to Fuel iX™, and correct me if I get it wrong. The second thing we want to talk about is the research, right? The new method for red teaming LLM that you came up with, and, more importantly for my audience, is what this means for enterprise and the industry. And then we are going to finish with the future. We're gonna talk about AI safety accountabilities and what's next.
[01:51] Bret Kinsella: Excellent.
[01:52] Etienne Noumen: Is that good for you?
[01:53] Bret Kinsella: Absolutely.
[01:54] Etienne Noumen: Perfect. So let's go. Let's start with trying to bridge the gap. So start with your journey and talk about Fuel iX™, TELUS Digital and your journey from, Fuel iX™ to TELUS Digital and what it means. Maybe explain to our audience all those terms.
[02:17] Bret Kinsella: Okay. First, just to give a perspective here, 30 years ago my journey started working with new technologies around the internet and deploying solutions there. And then I worked in RFID and mobile and all these other things, even before I got to AI.
[02:32] And some of the things that I wound up focusing on was drivers of adoption and barriers to adoption. So I think that this kind of frames the way I often think about new technologies in their adoption now over the last decade, actually. 12 plus years now, I've been in the AI space deploying systems, new products, a lot of the things in conversational AI that predated generative AI in the NLP-NLU space.
[02:58] But then, obviously, over the last several years, really a focus on generative AI more broadly. And that work led me to not only launch products and found companies, but also I started a research business around that, where we would publish through Voicebot.ai and Synthedia about understanding adoption rates, the technology, the markets and, really, for end users to understand transitions in the AI space.
[03:27] And that led me to getting to know some of the executives at TELUS Digital. And about a year and a half ago, I came on board to help them take some technologies they developed in-house, some applications that they were running that met their needs as a large enterprise for TELUS, the largest telecom in Canada, TELUS Digital, a company of 75,000 people, running contact centers around the world as well as a large digital services organization.
[03:57] And they wanted to harden and extend the applications they built for some of their internal users. And they also had been approached by a number of their customers and asked if their customers could use some of this technology. And so they looked to me as having been in the software space, in the AI space, to help shepherd some of these products into market.
[04:19] And so that's how I got there in terms of AI. And what's interesting, and I'll tell you that having been in the AI industry for a long time, we were hitting, like, frontier-level capabilities before we hit generative AI. And it's been such a pleasure to be working in this industry at this time because it is such a significant amount of innovation and there are so many benefits that it brings, that it has been really, really fun there.
[04:50] Ultimately, because I have that background and thinking about adoption drivers and adoption barriers, one of the things I started focusing on internally when I was working with the Fuel iX™ team, which is our generative AI set of platforms and applications across TELUS and TELUS Digital and for our customers, is I started thinking about safety and security.
[05:12] Etienne Noumen: Our audience, a lot of people are maybe engineers like me, but we also have beginners, right? So they are curious about AI. And a lot of the audience uses TELUS at home. So [how] can your research help average users of TELUS, for example?
[05:34] Bret Kinsella: It's hard to even account an average user at TELUS, but to give you a perspective: so, we have a tool called Fuel iX™ Copilots, which is a general purpose tool. And one of the things that we really focused on was that the person doing the work, they don't, they're not — most of 'em are not developers. Most of 'em are not technical, they're not business analysts. Are they able to use the tools, the technology in a way that actually meets?
[05:57] And what we found, time and again, and with this technology is a little bit of customization, grounding in your own data, being able to add search to this capability, being able to use your own prompt, and some other very specific type of features you might add makes a big difference for that individual.
[06:17] And I've come from this background. We were doing business process reengineering 30 years ago, and this was a big deal around ERP and the rise of the web and those types of things. And one of the things we always learned is the users know what they should be doing. They know what the process is today.
[06:31] They know where it's broken, they know where it can be improved. And so, generative AI is really interesting because it allows them, if you give a frontline worker an opportunity to make a change and to customize it, they'll actually find the efficiency for the task and those types of things.
[06:48] So we've rolled this out to tens of thousands of people internally across our organizations, plus some other customers. They're using it in finance, they're using it in HR, they're using it in marketing. They're using it in IT. They're using it in the contact center. They're using it in security for some things other than what we're gonna talk about today.
[07:05] But you can look at every functional area within our organizations in TELUS, a large global multinational; TELUS Digital is as well. And every part of the business is using generative AI in some way.
[07:18] Etienne Noumen: Yeah. Thank you. So let's go to our next section. We are going to talk about your research now. Let's start with the flaw of traditional red teaming. So maybe can you explain to our audience what red teaming is, and then maybe expand to red teaming for large language models. And then, from your perspective, what are the fundamental flaws of this approach and why it's no longer sufficient for assessing real world risk in AI systems?
[07:48] Bret Kinsella: Got it. And I'll take this from two perspectives. I'll just take this from the market perspective and I'll also take this from the technical perspective. Red teaming in general is an approach that's been used in security for a long time in order to test the security of any application. And that's been very important in that there's always been tools out there that people have utilized in order to test systems and those types of things.
[08:18] Usually those tools focus on “Does the function of the system operate?” and sometimes those tools actually look to see if there are exploits available, if there's vulnerabilities in those systems. And so, what we found over time, is that a best practice is to assign a red team. That red team are security experts, generally, who will then try to conduct an ad hoc pen test penetration test and try to compromise the system and see if it has these vulnerabilities that once it goes out into the wild, either for your internal users or externally exposed to customers, is there a risk here that you might be the victim of a malicious act because there's vulnerabilities there?
[08:58] When you do red teaming, the idea is go figure out where the vulnerabilities are and then take steps to blue team, which is to make it safe, to either mitigate or eliminate those risks through whatever steps you're taking with the various types of systems.
[09:15] And so then when you go out into the wild, you don't have to worry about those, right? So there might be other things you have to worry about, and that's why continuous red teaming has become more of a thing over the last decade. But that's where we start. And what I thought about in this is we're thinking about safety and security and the TELUS Digital trust office was early winning awards on this for AI safety and security.
[09:36] Safety and security, different things by the way. From an AI standpoint, most things we think about as security. Safety's, like, different here. But they were doing a lot of human-led testing and they had borrowed some people from some of our teams in the TELUS Digital part to actually do some more technical assessment of the safety and security issues.
[09:59] And we are looking at that and saying, “Okay, I understand this is gonna be a barrier to adoption. There's big risk here for organizations. There's reputational risk, there's legal risk, there's regulatory risk. There's competitive risk because what we have here is generative AI solutions are unbounded inputs and outputs systems.”
[10:20] Most systems that you had in the past are programmatic. You can only do a certain number of things, click buttons, that type of thing. And so you have a limited number of things that people can do, and then you have a limited number of things that can happen afterwards and it's programmatic. You've defined everything that could potentially happen, and anything beyond that is going to be some sort of vulnerability, security flaw or flaw in the code.
[10:41] When you look at AI systems like NLU, things like Siri and Alexa, those were actually unbounded inputs. People — it was a text box, people could type anything in, but it was bounded on the output. Every response lived somewhere in a database, could be evaluated, could be pre-approved. You were never sending somebody something that had language in it that might be problematic.
[11:05] Unless you had a content problem. So that's a whole ‘nother thing. But in general, like, you knew that when we get to generative AI, it's a probabilistic technology inbound and outbound. Unbounded, so the output can also be variable. Inputs are by definition, variable outputs can be variable. Okay? So that's the first thing we look at and we say, “Okay, wait a second.
[11:24] Nobody's approved some of these things. We know there's hallucinations. We also know that there are things that are not hallucinations that can come out of a system that is just pulling the data incorrectly. And that can lead to some of these problems that we mentioned.” So what I noticed there was an over-reliance on guardrails.
[11:44] Guardrails, if you think about an AI system, are intervention technologies at the time of a compromise, the time of an attack or an inappropriate input, even if it's not an attack. They will look at that and through either rules-based — sometimes there's ML-based solutions, some little AI agent type of thing, but usually rules-based — they'll look at it and say, “Is this appropriate for what we're trying to accomplish?” Then it goes to the system and the system tries if it gets past the guardrail, and then there might be a system prompt that says, “Hey, you should do these things as a bot, but you should not do these other things as a bot.”
[12:16] So maybe that's another place that you can potentially catch the problem. But again, it's at the time of the activity. And then there's output guardrails as well, which can evaluate what's coming out of it. So this makes sense, right? [What] we want to do is we want to have this input scan, output scan. That's where our probabilities list. We have basically three shots of getting it.
[12:34] However, what I also recognized was that people weren't doing much from a prevention standpoint, and that's what red teaming does. It's designed to identify your vulnerabilities so you can prevent it. And if we think about the rapid rise of generative AI copilots and bots and assistants, there are more of those than there are skilled red teamers in the world. It's just a fact. So that means we have a mismatch in type: the need and the demand. And the whole idea is if you can go through, whether you're doing human red teamers or an automated red teaming solution, can you then identify the vulnerabilities so that you can set up the guardrails properly, know what those are going to be, set up your system prompts properly, and actually have a layered defense model that we would expect in anything else?
[13:24] So this is where we started. Looking at this, and if we think about the fact that there's not enough red teamers, we need to have some tooling that can be available when red teamers are not available. So that's the first thing I would say. I'd say that's a need in the market. But if we then think about the technical standpoint, which you asked about, what red teamers have been doing in this space, and there's a lot of literature on this — TELUS Digital just published a academic research paper on Archive, people are gonna be familiar with that for publishing of academic research and you could look at this. And one of the things that we discovered in our work along these lines was that the way that people were doing red teaming analysis today was going to be inappropriate for the way large language models work.
[14:11] So, first of all, our existing security tools don't know how to do unbounded natural language in, unbounded natural language out because they're designed for this programmatic world. So that's the first problem. And so what do we do? We apply humans and a little bit of AI sometimes to do this, but the other thing they were doing is all red teamers are time-constrained because humans are running this.
[14:33] And generally speaking, if they're going to test something or they say, “Oh, okay, I've got a checklist here. Maybe it's OWASP Top 10 or something like that. Yep, I'm good. Move on to the next thing.” What we found was having run a lot of these attacks, 'cause we built some of our own automation tooling here through the Fortify solution, we found that the same prompt, when delivered multiple times, will give you a different result in terms of your vulnerability. That the system that you have — and the system is not just the model. The large language model is part of the system, but it's also all the other components that get wrapped up in it before it hits your end user.
[15:07] The system will properly refuse some things that we'd consider to be attacks or malicious inputs for a period of time, but then all of a sudden it won't. Because these are probabilistic technologies. So one of the first things we learned was that it's actually randomly distributed in terms of when a particular attack will fail and when it will succeed.
[15:29] And so that led us to this research that came out in that paper as well, that we started looking at, okay, in order to adequately understand if a particular attack is going to be successful, you actually need to run it in a series of tests at scale and then you'll have a better idea. Think of, like, cross-sections as opposed to just doing it once or twice or seven times.
[15:53] You can run a series of attacks in a slice and do those multiple times and understand whether that comes up the first time or the seventh time or the 15th time or something like that, or if it's clear. So that's the thing that we found was, the first thing was, people were trying to do this binary: does this attack work or does it not?
[16:10] Etienne Noumen: Yeah. Okay. Thank you for that. I was listening to the radio this morning. I heard something that, I thought about you. So there was an expert on Radio-Canada who said it’s very easy now to exploit companies just by using a prompt. It's like, you can use sophisticated prompting to extract companies’ secrets that they don't wanna share. So I'm wondering, at a lower level, can your research help prevent things like that?
[16:42] Bret Kinsella: Yes, absolutely. And I'll tell you that it depends on how powerful your tools are and then, normally, how skilled your researchers are.
[16:54] But it also depends on how far they're willing to go. And what I've found over the years is that there's a lot of red teamers who are really creative and they'll do some great things and they'll close off vulnerabilities and that's really important. But many of them are just not as malicious as the attackers out in the wild.
[17:14] So that's the first thing. Yeah. One of the things that we did was we basically looked at this and we said, “Okay, this is a problem where we need scale. We need to be able to repeat the prompts over and over again.” The other thing we found was that there was an inconsistency in the way people were running this.
[17:29] They might have a list of — let's say they wanted to automate it. They wanted to send a hundred prompts or 500 prompts to a large language model and maybe they wanted to send 'em five times, 10 times so they could address this issue that we'd identified in terms of repeatability. In that case, each one of those attacks is still discreet.
[17:52] So this OPRO approach — and this was proposed by an academic researcher and we did some additional research on it and built it into a model to test it, which is optimized by prompting. What it does is it basically says, “If you think about these systems,” and this is how you protect.
[18:07] So you say, “Okay, I wanna run this. I need a better attacker.” So I use AI as my attack model. And this AI I want it to be really clever. It's gonna have attack objectives, it's gonna know methods and techniques and all these other types of things. I've got a target, which is the bot, and it might have a code of conduct of what it's supposed to do, not to do.
[18:25] We'll take that into account when we're building our attacks. And then we have a judge, it's another model which is designed to evaluate the inputs and the outputs that are going into the model. However, what it does in this case is instead of just evaluating it attack by attack and as if they're discreet and there's no intervention, says, “Oh, this is interesting.”
[18:46] That judge then can feed the information back to the attack model which has an optimization, which will say, “Oh, that didn't work. Let me try another technique.” And so this is the idea: instead of having a library of static attacks you're gonna run again and again, what we did is we created this system that could create novel attacks that every time it didn't attack, heard from a judge about whether it was successful or not, would then lean in and change it and modify the attack.
[19:14] And we found that's very important. Now, how does this help somebody? So if you do this at scale, let's say, like, this morning I just ran 3000 attacks against a bot, and — maybe I did some programming in the 1980s and ‘90s — but it's been a long time. I would consider myself non-technical, maybe semi-technical in this, but mostly non-technical.
[19:35] So I can set up, I can click a button, I can do 3000 attacks. There's gonna be repeats in there, there's gonna be, these are all novel attacks and things like that and it's gonna show me, for a particular target, a copilot, a bot, what the vulnerabilities are, categorized into 15 different vulnerability categories.
[19:53] And then specifically identifying the type of vulnerability within that category. What that provides you, then, is with a blueprint for the organization to say, “Okay, here's my vulnerabilities,” which is my risk that I might have a problem. And then the organization can then use that to move forward and say, “Okay, these are the things I need guardrails for.
[20:13] “These are the things I need prompt engineering for. These are the things I might need a supervisor agent to evaluate for. These are the things that — maybe I need to change my data set around.” And so that's the important thing. And I think about this idea that an ounce of prevention is worth a pound of cure.
[20:31] Sort of this old saying, just a little bit of preventative work, identifying where your vulnerabilities are, getting that blueprint and then closing those off relieves the stress on the system where you have guardrails which are also probabilistic and may not get a hundred percent of the things.
[20:48] You can actually implement things so it does either eliminate those or at least significantly mitigates that risk and then you can focus your attention on the areas that you consider to be most problematic.
[21:01] Etienne Noumen: Yeah. And then that brings us to the solution: the OPRO technique. So can you explain to our audience what OPRO is and can you explain how it allows a large language model to red team itself in a way that is both reliable and repeatable?
[21:20] Bret Kinsella: Yeah. And that's great. And so I just introduced this a little bit in the answer to the last question, so this is great. I can go a little bit deeper on it.
[21:27] Etienne Noumen: Yes, please.
[21:27] Bret Kinsella: So the idea here is that what we wanna do is continually optimize. This is what large language models are good at and this is what humans do, but the humans can't do it at this level of scale.
[21:38] So if a human's going through this, they're going to try some type of attack technique. Let's say somebody's worried about data exfiltration or something like that from a large language model of PII, personally identifiable information, for customers. And they're gonna do a few attacks.
[21:51] They're gonna see what comes back and they're gonna do some sort of human evaluation. Or maybe they use an AI tool to supplement their own interpretation of what the response was. And they say, “Oh, okay. I know a bunch of different techniques. Let me try a different technique to do this. Let me try a different technique,” right?
[22:07] But how long, how many times, iterations can a human do that? How many things can they do? And so this is why most red teaming programs last at least days, many weeks or months. And the opportunity here with OPRO is basically “We can do that faster. We can do that with more creativity in minutes.”
[22:32] So what it does is it basically looks at every attack. It looks at the response, it looks at the judge. What the judge says about that feeds back to the attack model. The attack model says, “Oh, I'm gonna try something else. I'm gonna try something else. I'm gonna try something else. I'm gonna try something else.”
[22:48] So you can do this at a scale that we haven't been able to do before. That's one of the great things about these technologies, because what it does is it gives you finer precision on where your vulnerabilities actually are and doesn't give you the false sense of security that you've tried a few things and okay, you gotta move on to something else.
[23:06] I hope that's good enough. This can really get into a lot of depth and I hope, I expect that this is going to be — maybe right now it's state of the art. I expect very soon this is going to be what everybody's going to have to do because it's the only way to get the type of comprehensiveness and depth at the same time.
[23:26] Etienne Noumen: Yeah. So I think that's good, but now we need to be able to evaluate. I want us to talk about measuring the risk with ESR. So instead of a single metric, I think your team, you focus on evaluating the attack success rate as a distribution. For our audience, what does this actually mean and what new insight does this probabilities approach give us about the more models through security profile?
[23:55] Bret Kinsella: So I think the first thing we have to think about is it's not that binary situation where I try to attack once, it's failed or been successful and we would — the terminology we use is that it's clear or it's vulnerable. So from a probability distribution, what we're doing is we're just running the same attack, plus similar attacks, over a period of time. And that might be seconds or minutes or something like that.
[24:24] And then what we're doing is we're calculating the exact success rate based on those multiple attacks and multiple variants of that attack. And that will give you much higher likelihood around a particular attack objective or a vulnerability category of your likelihood to be subjected to a potential attack and a victim of that in your system.
[24:47] Etienne Noumen: Cool. Now I want us to talk about real world application in an enterprise, right? For example, me in my own company now, we are all excited about AI. We are still asking those questions. The AI safety keeps coming. So I want you to tell us how should a company in high stakes industries like finance or healthcare change their safety testing practices? Based on these findings, what's the first tangible step that they should take?
[25:16] Bret Kinsella: Like finance, healthcare or energy, right? It looks really close to home for you, I can imagine.
[25:22] Etienne Noumen: Yeah, exactly.
[25:25] Bret Kinsella: I would say the answer is comprehensiveness, repetition and creativity.
[25:30] Etienne Noumen: Good.
[25:31] Bret Kinsella: So comprehensiveness is this idea, is how broadly are you actually covering?
[25:37] So if you start with things like the frameworks like OWASP Top 10, Fortify — or, excuse me, not Fortify — the Mitre attack models and vulnerability models and those types of things, that's a good place to start. If you look at some of the other frameworks that people use for AI safety and security, they're fine.
[25:59] They provide a checklist, but they don't actually tell you what to do within those. So the first thing I would say is comprehensiveness. You have to go beyond what you think you need to do. It's actually a combination of a number of different things. And so what we actually come up with is we have 139 attack objectives that we think broadly apply.
[26:16] An attack objective is the objective that a malicious actor might have in order to compromise your system. So we have a taxonomy there which categorizes this in 15 different vulnerability segments. And then we also have the idea here, finance, healthcare, energy have different types of specific vulnerability considerations that they might have beyond general.
[26:42] Those might be specific to the bot itself. It might be specific to the industry. It might be specific to how their or where their company operates. And so those can come out of a code of conduct and we'll automatically identify those out of the code of conduct being just, like, what the bot is supposed to and not supposed to do, or you can actually enter in your own.
[27:01] So this is how you get more comprehensiveness. And then the repetition is doing these things multiple times over and over again just to make sure that your first, second, third attempts are representative of what this is likely to be in the field. And then creativity is: you need to, I think a lot of times…
[27:24] I respect red teamers. I've been around red teamers a long time. I know they do a great job. But what I know is you are limited to the creativity of any given red teamer and most of them are good at certain types of things. Most of them are gonna be good at security, but not AI safety things. They might be able to come up with some AI safety things, but there's other people who might be really good at that and not good at security.
[27:42] So any given time, depending on your team composition, you might have some gaps in terms of the creativity, the things you might do, and the types of topics which are particularly technical or are particularly uncomfortable for people to deal with, whereas an AI model doesn't.
[28:00] It doesn't care. So I think that's one of the things that we'd be looking at is using those types of automations for comprehensiveness, repetition and ingenuity. The last thing I'll say is because what I'm seeing today is there's just not a lot of red teamers. There's not a lot of people in your security team that can do it for you.
[28:17] You don't have access to budget or people out externally to do your red teaming all the time — particularly people with the skill of both AI systems and red teaming — that they're doing them infrequently, like when they launch the product. But when the model's updated in seven months, because it's deprecated, are they doing another red team?
[28:37] Are they doing a full red team? How are they doing it? Are they doing the same red team they did the last time or are they coming up with new attacks to see if there's — you know, so they understand a regression or novelty. What about they swap out an orchestration tool that works with their AI system? So did that introduce some new things?
[28:52] So, I think this idea of comprehensive repetition, but also there's good hygiene here in terms of being able to do it more frequently. And that's really where we came in. We started saying, “Hey, we can automate this process so that it's easy enough that a product manager, project manager can run it.” At any given time.
[29:10] You make any change or you just want to, like, a noon on Friday, your chief information security officer wants an update. “Have we had any regression?” or something like that. You can schedule it and you can always have that data at your fingertips and then that gives the application owners the confidence to know we're good.
[29:29] That the risk and security folks in the organization feel that they're good, that they're okay with where things are. Or it gives them a blueprint to say, “Oh, okay, we do have some issues. Let's proactively, as opposed to reactively with guardrails, let's proactively see if we can close these off.” And then go to your security teams, or maybe your existing development team can actually change some things, implement some guardrails, change some specs in your system prompt in order to close off or mitigate those risks.
[29:58] Etienne Noumen: Now, the elephant in the room: regulation. So in industries like healthcare, finance — even energy — we have a lot of regulation. So how do you go around it, like making sure that your solution still stays compliant?
[30:18] Bret Kinsella: Yeah. I think this is one of the hardest things because I really feel for a lot of the users, and in a lot of ways, if you look at with TELUS, TELUS Digital: we are our users.
[30:27] It's unique. Usually you create software for somebody else and then you're trying to learn all their problems. TELUS is a regulated telecommunications provider.
[30:36] Etienne Noumen: Yeah.
[30:37] Bret Kinsella: It also provides services to governments and healthcare organizations. TELUS Health is a medical provider and has all these same problems.
[30:46] TELUS Digital provides contact center support for lots of regulated industries. So I'd say that we feel the same challenge and that a lot of the regulations today are ambiguous. So how to comply with them is a difficult thing. There's just checklist items. You can say, “Are we doing this? Are we not doing that?”
[31:05] But if you're really looking from a governance model, risk and safety standpoint of being able to set a standard that you are comfortable with as an organization that you're doing your fiduciary responsibility, I think that's the type of thing that most organizations are going to have to do.
[31:19] And that's where, I think that, if we look at the White House AI Action Plan and what's going on in the EU, there's different levels of specificity. Most of it's still vague and I think for a lot of organizations, they have to start by saying, “Okay, we can guess what these vague regulations mean at a minimum, but they're only going to get more specific over time.”
[31:43] So you probably should be putting in a plan for how you're going to do layered defense, not just for AI security but AI safety, 'cause a lot of the regulators and the different jurisdictions around the world are focused on those AI safety issues that might be a reputational hit to you if you teach somebody how to commit a specific type of negative attack or act in the world, but very often could lead to fines from regulatory bodies in different parts of the world if this comes up.
[32:12] Etienne Noumen: Okay, thank you. Now let's talk about the future of AI safety. So looking ahead, what's the next major challenge of frontiers in AI safety testing? Where do you believe researchers and developers should be focusing their efforts now to prepare for what's next?
[32:28] Bret Kinsella: I think there's a couple things that I would say.
[32:30] First of all, I'd say that vulnerability testing is at the forefront today because what I see is actually most of the attempts that people are taking are fairly limited in terms of what they're trying to do. So they're trying to put in rules-based systems to do this intervention at the time, and they're not doing vulnerability.
[32:52] So I think that's the first thing is the vulnerability testing is gonna become much more sophisticated overall so that organizations can proactively close off risk. And the second thing I think is important is I think supervisor agents, which we've had some experience in deploying, are going to be much more common and prevalent.
[33:15] And this goes beyond the traditional guardrails because it's actually an agentic AI system that's actually reviewing all the information — that's all the conversations — and it's looking for specific things. You might have several of these operating at any given time as another layer of defense.
[33:34] And then the next thing that I think we'll be looking at, which is not something today that has a lot of focus in the solution space, in the market — and that's identifying the root cause of some of these problems. So today, vulnerability testing will give you a better sense of where the problems are. Interventions, like guardrails will help you, create some sort of additional protection.
[33:57] But what is the root cause? Why does this come up at the model level, at the data level within your system? And so being able to go into that and understand where those problems are so that you can then proactively, from the blueprint going forward to guardrails, you can go backwards into the model, into the data and therefore close off some more of those risks.
[34:19] Etienne Noumen: Thank you. And now, let's go to a final takeaway. What piece of advice can you offer to people who are listening to us and try to stay ahead in terms of AI safety? What are the building blocks that they need to have about innovation and also being responsible? What piece of advice do you have for them?
[34:48] Bret Kinsella: I think this comes down to policy, technology, process in general. From a policy standpoint, you really do need to define what is important and not important to you. We just did some work with the TELUS data trust office and the reconciliation office for a UN event in Geneva, and we we were really talking about that.
[35:13] How do you set up what is acceptable for your organization? How does that fit with regulation but how does this fit in with you just being a good corporate citizen? How does that fit in with you doing right by your customers? So I think most organizations are still very new on that governance, particularly around AI safety and security, not knowing what should be inbounds and outbounds.
[35:30] From a technical standpoint. I would say one learning that I would try to help everybody with is narrow the scope of your instruction to your models. And use multiple models, multi-model systems, to perform different tasks. Most people don't know this yet because they try to put 10,000 words into a system prompt or something like that.
[35:53] And what they're doing is they're telling it to do a lot of things and to not do a lot of things. But what they don't know is that sometimes in the context window, tokens get lost in subsequent meetings and by saying “don't do something” in a system prompt, sometimes it will do it — it won't recognize the don't.
[36:13] And the model wouldn't even have considered it. But it's in the system prompts. It's like, it's gonna start doing it. So the more you can narrow the scope of what you want your solutions to do, the better. And you can do that. You can still have a broad solution scope. You can just do a different model.
[36:27] Here's the model that says what you're supposed to do. Here are the models that say what you don't do, and you use those as governors before you start executing the process. So those are the things I would say.
[36:38] And then from a process standpoint, I think that everybody really should be testing their systems on a regular basis because otherwise they will not keep up with the vulnerabilities. And if you're just doing manual red teaming, there's no way you're gonna catch everything the first time. Even with manually-led, technically-led automation testing, you're not gonna catch everything.
[36:57] So you should be thinking about this not just at your first product launch, but you should be speaking of this periodically, probably at least monthly, just looking at regression testing and every time you make any type of significant release, system upgrade, those types of things.
[37:12] So those are the three things I would recommend people do. And we've learned about all these things, right? Because we've been going on this journey just like our customers. We're just a few months ahead of them or maybe a year ahead of a lot of the people that we talk to. But that's hard-won experience around how to best get the value out of this and how to best protect yourself in an area which has a ton of benefit because these systems are really extraordinary in many ways, but introduce novel risks that we've seen time and again. And so let's make sure that we're doing the things, addressing the things that could be a barrier to adoption, like security and safety risk, at the same time, is putting a lot of effort into making these things better and solving problems more effectively.
[37:58] Make sure you set aside that time to do the work to reduce some of these barriers and these harms that could be lurking inside your models.
[38:07] Etienne Noumen: Cool. Yeah. So that's all the questions I have for you today. Thank you again. That was great. For more insight into the world of AI, be sure to follow and subscribe to AI Unraveled at Apple Podcasts.
[38:20] Thank you again, Bret. That was interesting and we'll be in touch. Thank you.
[38:27] Bret Kinsella: Fantastic, Etienne.
[38:32] Robert Zirk: Thank you so much to Etienne Noumen at AI Unraveled for allowing us to share this episode with our audience.
[38:38] At TELUS Digital, we're taking a considered approach to AI development with our Humanity-in-the-Loop principles for purposeful innovation. Our comprehensive data and AI services help enterprises build, scale and lead in the AI era, guiding businesses on their journeys from data strategy and governance to GenAI engineering and agentic automation.
[39:01] As Bret discussed in this episode, our Fuel iX™ platform and advanced safety testing approaches can shorten time to value while managing risk at every stage, ensuring businesses can innovate confidently with the highest standards of security and compliance.
[39:19] We'll be back with a brand new episode of Questions for now soon. In the meantime, visit questionsfornow.com to subscribe or hear any of our previous episodes.
[39:29] I'm Robert Zirk, and until next time, that's all… for now.
Explore recent episodes
See the full library- 39 minsSeptember 3, 2025
Presenting: AI Unraveled — The Future of AI Safety Testing
- 25 minsAugust 27, 2025
Are you leaving revenue on the table in customer experience?
- 42 minsAugust 7, 2025
Presenting: What The Fraud? — The business of trust
Suggest a guest or topic
Get in touch with the Questions for now team to pitch a worthy guest or a topic you’d like to hear more about.
Email the show