Trevor McFedries

OpenAI researcher on why soft skills are the future of work | Karina Nguyen

Karina Nguyen leads research at OpenAI, where she’s been pivotal in developing groundbreaking products like Canvas, Tasks, and the o1 language model. Before OpenAI, Karina was at Anthropic, where she led post-training and evaluation work for Claude 3 models, created a document upload feature with 100,000 context windows, and contributed to numerous other innovations. With experience as an engineer at the New York Times and as a designer at Dropbox and Square, Karina has a rare firsthand perspective on the cutting edge of AI and large language models. In our conversation, we discuss:

Published
Published Jun 14, 2025
Uploaded
Uploaded Jun 14, 2026
File type
YouTube
Queried
0

Full transcript

Showing the full transcript for this video.

AI-generated transcript with timestamped sections.

0:00-1:31

[00:00] Not only are you working at the cutting edge of AI and LLMs, you're actually building the cutting edge. When I first came to Android, I was like, oh, I really love front end engineering. And then the reason why I switched to research is because I realized, oh my God, cloud is getting better at front end. Cloud is getting better at, like, coding. I think cloud can, like, develop new apps. What skills do you think will be most valuable going forward for product teams in particular? [00:30] like filter through them and not just build the best product experience. I think it's actually really really hard to teach the model how to be aesthetic with really good visual design or like how to be extremely creative in the way they write. What do you think people most misunderstand about how models are created? When you taught the model some of the self-knowledge of you actually don't have a physical body to operate in the physical world, the model would get like extremely confused. [00:58] Today, my guest is Karina Nguyen. Karina is an AI researcher at OpenAI, where she helped build Canvas, [01:04] tasks, the O1 chain of thought model, and more. Prior to OpenAI, she was at Anthropic, where she led work on post-training and evaluation for the Cloud 3 models, built a document upload feature with 100k context windows, and so much more. She was also an engineer at New York Times, was a designer at Dropbox and at Square. [01:22] It's very rare to get a glimpse into how someone working on the bleeding edge of AI and LLMs operates, [01:28] and how they think about where things are heading.

1:31-3:09

[01:31] In our conversation, we talk about how teams at OpenAI operate and build products, what skills she thinks you should be building as AI gets smarter, [01:39] how models are created, why synthetic data will allow models to keep getting smarter, and why she moved from engineering to research after realizing how good LLMs are going to be at coding. If you enjoy this podcast, don't forget to subscribe and follow it in your favorite podcasting app or YouTube. It's the best way to avoid missing future episodes, [01:56] And it helps the podcast tremendously. [01:58] With that, I bring you Karina [02:00] Nguyen. [02:01] This episode is brought to you by Interpret. Interpret unifies all your customer interactions, from gone calls to Zendesk tickets to Twitter threads to App Store reviews, and makes it available for analysis. It's trusted by leading product orgs like Canva, Notion, Loom, Linear, Monday.com, and Strava to bring the voice of the customer into the product development process, helping you build best-in-class products faster. [02:31] and accurate insights into your business. Connect customer insights to revenue and operational data in your CRM or data warehouse to map the business impact of each customer need and prioritize confidently [02:41] and empower your entire team to easily take action on use cases like win-loss analysis, critical bug detection, and identifying drivers of churn with Interpret's AI assistant wisdom. Looking to automate your feedback loops and prioritize your roadmap with confidence like Notion, Canva, and Linear? Visit enterpret.com slash Lenny to connect with the team and get two free months when you sign up for an annual plan. This is a limited time offer. That's

3:11-4:41

[03:11] you [03:13] This episode is brought to you by Vanta, and I am very excited to have Christina Cassioppo, CEO and co-founder of Vanta, joining me for this very short conversation. Great to be here. Big fan of the podcast and the newsletter. Vanta is a longtime sponsor of the show, but for some of our newer listeners, what does Vanta do and who is it for? Sure. So we started Vanta in 2018, focused on founders, helping them start to build out their security programs and get credit for all of that hard security work with compliance [03:43] like SOC 2 or ISO 2701. Today, we currently help over 9,000 companies, including some startup household names like Atlassian, Ramp, and Langchain, start and scale their security programs, and ultimately build trust by automating compliance, centralizing GRC, and accelerating security reviews. That is awesome. I know from experience that these things take a lot of time and a lot of resources, and nobody wants to spend time doing this. [04:10] That is very much our experience, but before the company, to some extent, during it. But the idea is with automation, with AI, with software, we are helping customers build trust with prospects and customers in an efficient way. And, you know, our joke, we started this compliance company, so you don't have to. We appreciate you for doing that. And you have a special discount for listeners. They can get $1,000 off Vanta at Vanta.com slash Lenny. That's V-A-N-T-A.com slash Lenny for $1,000 off Vanta. [04:39] Thanks for that, Christina. [04:40] Thank you.

4:45-6:15

[04:45] Karina, thank you so much for being here. Welcome to the podcast. Thank you so much, Lenny, for inviting me. [04:50] I'm very excited to have you here because not only are you working at the cutting edge of AI and LLMs, you're actually building the cutting edge of AI and LLMs. You recently launched this feature, which basically... [05:03] the first agent feature of OpenAI. [05:06] I also just did this survey. I don't know if you know about this. I did a survey of my readers and asked them what tools to use every day in your work and most use. [05:14] and chat gpt was number one above gmail above slack above anything else 90 of people said they use [05:22] It's absurd. And it wasn't around two years ago. [05:25] Yeah. [05:26] Also, we're recording this the week that OpenAI announced Stargate, which is this half trillion dollar investment in AI infrastructure. So there's just like a lot happening. [05:34] constantly in AI and you have a really unique glimpse into how things are working, where things are going, how work gets done. So I have a lot of questions for you. I want to talk about how you operate and how you work at OpenAI. [05:47] where you think things are going, what skills are going to matter more and less in the future, and also just where things are going broadly. So how does that sound? Sounds great. Thank you so much. [05:57] Yeah, I was extremely lucky to join early days on topic and kind of learned a lot of things. [06:05] there and I joined OpenAI around like eight months ago. So yeah, I'm excited to get more into it. Okay, I'm gonna definitely ask you about the differences between those, but I want to start more technical.

6:15-8:02

[06:15] and just dive right in. I want to talk about model training. [06:19] people always hear about models being trained, these big models, how much data it takes, how long it takes, how much money it takes, how we're running out of data, which I want to talk about. Let me just ask you this question. What do you think people most misunderstand about how models are created? Model training is more an art than a science. And in a lot of ways, like we as model trainers think a lot about like... [06:46] data quality is like it's one of the most important things in model training is like how do you ensure the highest quality data for certain like interaction model behavior that you want to create. But the way you debug models is actually very similar the way you debug software. So one of the things that I've learned early days at Anthropik was like we've discovered, especially with like cloud 3 training, when you taught the model some of the self-knowledge of like, hey, [07:16] like you actually don't have a physical body to operate, like in the physical world. But then at the same time, we had data that kind of taught the model some of the function calls, which is like, this is how you set the alarm. And so the model would get extremely confused about like, [07:35] whether it [07:36] it can set an alarm, but it doesn't have a body in the physical world. So it's like the model gets confused and sometimes it's like over refused. So sometimes it's like, I don't know, like, sorry, I cannot help you. And so there is always like a balanced trade-off between how do you make the model to be more helpful for users, but also not being harmful in other scenarios. So it's always

8:06-9:47

[08:06] across a variety of diverse scenarios. [08:09] That is so funny. I never thought about that. Most of the data that it's trained on is kind of like assuming... [08:13] It's like a human describing the world and how they operate, and there's... [08:16] It assumes there's a body and you can do things and the model is told you don't have a body. Yeah. Okay. [08:22] I want to talk a little bit about data [08:24] While we're on this topic, I know you have strong opinions here. There's kind of this meme that [08:29] Models are going to stop getting smarter because they're running out of data. They're trained in a large part on the Internet, and there's only one Internet, and they've already been trained on it. What more can you show them about the world? And there's this trend of synthetic data, this term synthetic data. [08:43] What is synthetic data? Why do you think this is important? Do you think it's going to work? I think there are two questions here we can unpack. [08:50] one at a time. But people say if you're hitting the data wall, I think people think more in the terms of like, pre-trained large models that are trained on the entire internet to predict the next [09:06] token, but what actually the model is learning during that process is actually how do you compress [09:13] the compression algorithm here. The model learns to compress a lot of knowledge, and it learns how to model the world. [09:21] So the next prediction of the word, like, teach me how to [09:26] Drive. [09:27] Basically, and you only have like... [09:29] a few words that will match that, a car. So the model actually learns [09:34] um, [09:35] about the world in itself. So it's like it's modeling human behavior. Sometimes it's modeling. And when you talk to like pre-chain models, which are very, very large, they're actually extremely

9:47-11:20

[09:47] diverse and extremely creative because you can [09:51] talk to almost any Reddit user through Pugin model. [09:55] But I think what's happening right now is like new paradigm of like L1 series is of like the scaling [10:04] in post-chaining itself is not hitting the wall. And that's because [10:10] Basically we went from like raw data sets, from featuring models, [10:16] To [10:17] infinite amount of tasks, [10:19] that you can teach the model in the post-chaining world. [10:22] via reinforcement learning. So any task, for example, like how to search the web, how to use the computer, how to write well, like [10:34] all sorts of tasks that you're like trying to teach the model [10:37] all the different skills. And that's why I'd be saying there's no [10:41] data wall or whatever, because there will be infinite amount of tasks. And that's how the model becomes extremely super diligent. [10:48] and we are actually getting saturated in all benchmarks. So I think the bottleneck is actually in evaluations, that we don't have [10:58] all the frontier, like EVAs, like... [11:02] I don't know, GPGA, which is like a Google proof question answering like PhD level. [11:09] Intelligent Patchwork is getting to more than 60-70%, which is what GHD gets. So it's literally hitting the wall in evals.

11:20-13:06

[11:20] I want to follow both those threads. So the first is on this idea of synthetic data. It's a simple way to understand it that the models are generating the data that [11:28] future models are trained on, and you ask it to generate all these [11:32] ways of doing stuff, all these tasks, as you described, and then [11:36] the newer models trained on this data that the previous model generated? Some tasks are synthetically curated. So this is an active research area. It's like, how can you synthetically construct new tasks with the model to learn? Sometimes when you develop products, you get a lot of data from the product and user feedback, and you can use that data too in this post-training [12:06] human data because actually some of the tasks can be really, really hard to teach. Like, experts only know certain knowledge about some chemicals, like biological knowledge, so you actually need to tap into the expert knowledge a lot. [12:27] Yeah, I think [12:28] To me, synthetic data training is more [12:33] for like, [12:34] product, it's like a rapid model iteration for similar product outcomes. And we can dive [12:40] more into, but the way we made Canvas and tasks, and new like part of features for JTBK was mostly done by synthetic training. [12:49] Let's actually get into that. That's really interesting. I want to talk about evals, but let's follow that thread. So talk about how this helped you create Canvas. So when I first came to OpenAI, I really had this idea of like, okay, like it would be really cool for Chachapiti to create.

13:06-14:40

[13:06] actually change the visual interface, but also change the way it is with people. So, [13:14] going from being a chatbot to more of a collaborative agent and a collaborator, it is a step towards more giant existence. [13:25] that become innovators ultimately. And so the entire team of like applied engineers, designers, products, like research kind of got [13:35] formed in the air, almost out of nothing. It's just a collection of people who just got together and we rapidly started iterating with each other. Actually, Kevus is one of the [13:49] I would say the first project of OpenAI where researchers and applied engineers started working together from the very beginning of the... [13:57] product development cycle. [14:00] And I think there's a lot of things that we have learned on the way, but I definitely came with the mindset of we need to do a really rapid model iteration such that it would be much easier for students [14:15] engineers to work with the latest model possible, but also learn from user feedback or early internal dog food, how do we improve the model very rapidly. [14:29] It's really hard to... [14:32] kind of figure out how people, when you deploy a product, how people would be able to use it.

14:40-16:20

[14:40] the way you synthetically train the model is basically figuring out, like, what are the most core behaviors that you want this product feature? [14:49] to do. And for Canvas, for example, it was, it came down to like three main behaviors. It was, how do you trigger Canvas for prompts like, write me a long essay, when the user intention is mostly like iterating over long documents, or write me a piece of code. [15:09] Or when to not trigger Canvas for prompts like, can you tell me more about [15:15] president, like [15:17] I don't know, some of the general questions. So you don't want to let trigger Canvas, because the user intention is mostly getting answered, not necessarily iterating over the long document. [15:28] The second behavior is [15:32] how do we teach the model to update the document when the user asks? So one of the behaviors that we taught the model is actually have some agency and autonomy to literally go to the document and select specific sections and either delete it or edit. So highlight it and rewrite certain sections. So sometimes the model, sometimes the user would just like say, change the second paragraph to be [16:02] And you would have to teach the model to literally find the second paragraph in the document and change it to a friendly tone. So basically you teach both how to trigger edit itself, but also how do you teach the model to get higher quality edits for the document.

16:21-18:00

[16:21] In case of coding, for example, there's also the question of how good the model is at completely rewriting the document versus having very specific targeted edits. So that's another layer of decision boundary within edit itself. It's like select the entire document and rewrite completely, or you want to have very targeted custom keywords. [16:51] because you thought the quality of the were rights were much higher. But over time, you're shifting based on user feedback and what you're learning from iterative deployment. [17:01] Lastly, the third behavior that we taught synthetically, the model is how to make comments on any document. So, [17:10] The way we use that is like... [17:13] we would use a one model to produce, to simulate user conversation. Let's say, write me a document about XYZ. But then we used a one to produce the document. And then we kind of injected user prompts to be like, "Oh, make some comments, critique my piece of writing," or "critique this piece of writing that you just made." [17:39] And then we taught the model to make comments on the document, on very specific documents. Also, what kind of comments do you want the model to make? Do they make sense or not? How do you teach the quality of that? And it all came down to measuring progress via very robust evals.

18:01-19:31

[18:01] But yeah, this is how you use a long synthetic data generation for the screening. Okay, that's so interesting. [18:08] So you talk about this idea of teaching the model and you mentioned how it's [18:12] using synthetic data to teach the model different behaviors is a simple way to think about it. Basically, that's where you do that by showing it what success looks like using basically evals. Is that the simple way to think about it? Like, [18:25] Here's what... [18:26] you doing this successfully would look like and that teaches it okay i see this is what i should yeah great yeah amazing yeah you got it okay got it um i want to start unpacking what your day-to-day looks like as you're building these sort of things is it like you sitting there [18:38] talking to some version of ChatGPT, crafting these evals? - Sometimes I do that, but sometimes I do sit with ChatGPT. Actually, I think I learned this so much from Antapak, is like, people spend so much time just prompting models, and we call it a little bit bash all the time, and you actually get a lot of new ideas. How do you make the model? [19:05] better. It's like, oh, this response is kind of weird. Why is it doing this? And you start debugging or something, or you start figuring out new methods, or how do you teach the model to respond in a different way, have better personality, let's say. So it's the same thing of how... [19:22] personality is made like in the models windows. It's like very similar methods. But yes, I think my time

19:31-21:11

[19:31] I have changed. I think when they first came, I was like mostly like research I see work. So I was like building a lot of like [19:39] I was writing code, chaining models, writing evals, working with PMs and designers to learn, teach them, how to even think about invitations. I think that was a really cool experience. And I think this is an adoption of-- [19:57] how do we do this prior management of AI features or AI models? [20:04] Thank you. [20:05] Yeah, but now it's mostly like... [20:07] you know, like management and like mentorship. I'm still like doing, I see like research code [20:15] after like 4:00 PM although, but yeah, it's kind of like changed. [20:21] All right, don't talk too much about being a manager, because everyone's firing their managers. Who needs managers anymore? That's what I hear now. Just kidding. It's interesting that so much of your time... [20:30] was spent on teaching product teams how evals integrate and how important... [20:35] That is, and I've heard this a few times, and I haven't personally experienced it yet, so I think it's an important thread to follow is just how writing these evaluations is going to become important. [20:45] increasingly an important part of the job of product teams, especially when they're building AI features and working with alums. So can you just talk a bit more about what that looks like? Is it like sitting there with an Excel spreadsheet, basically showing like, here's the input, here's the output, here's how good the result was? Talk about what that actually looks like very practically. It certainly depends on what you're developing, but there are various types of like evaluations. So sometimes I do ask, well,

21:11-22:41

[21:11] product managers or there's also like new role that we have like model designers to [21:19] Kind of like go through some of the user feedback maybe, or like think of like various user conversations that should have triggered, like under this self-amstances, it should trigger Canvas. And then you have this like ground truth label of like, okay, with this conversation, it should trigger Canvas. Under this conversation, it should not trigger Canvas. And you have this like very deterministic kind of like evolve. [21:42] that for like decision-minded behaviors is like this, when we were launching tasks, for example, like how do you make correct schedules is like actually really hard for the model. But we built out like some of the deterministic evaluations that is like, okay, like if the user says like 7:00 p.m., it's like, [22:04] the model should say 7pm. So you can have a defamilistic evolve, whether it's pass or fail, [22:10] So, yeah, and the way it works is, I was like... [22:13] Sometimes I ask, [22:15] Product managers just like go create like a Google sheet, like have different tabs and like, um, [22:21] What's the current behavior? What's the ideal behavior? And why? Or some nodes. And sometimes we usually use it for eval, sometimes we use it for training. Because if you give the spreadsheet to one model, it can probably figure out how to teach itself a good behavior.

22:42-24:28

[22:42] And I think there are second type of evals that is-- [22:46] kind of more prevalent is like human evaluations. And you can have specific trainers or you can have internal people to [22:57] when you have a conversation of the prompt, and then you have various completion of models, you kind of choose the win rate. Which model is the best? Which model produced the highest quality comment or edit? And then you can have continuous win rates. And as you develop new models, it should always win over the [23:17] previous models. So it depends on what you want to measure. So interesting. Like basically what I'm hearing in [23:24] This is something I'm learning about as I talk to people, is product development might move from this, like, here's a spec PRD, let's build it together. [23:34] And then, cool, let's review it. Are we happy with this? From that to, hey, AI, build this thing for me. [23:40] And here's what correct looks like. And I'm spending all my time on what does correct look like. [23:45] and evals, essentially. You definitely want to measure progress of your model. And this is where evals is, because you can have prompted model as a baseline already. And the most robust evals is the one where prompted baselines [24:03] get the lowest score or something. And then because then you know, like, if you're trained a good model, then it should like, just like hill climb on that eval all the time, while not like also like regressing on like other intelligence evals. So it's like, I think it's more what, that's what I'm saying, like it's more of an art than science, it's like, okay, like if you optimize the model for this behavior, like you kind of don't want to like brain damage in like other areas of intelligence or,

24:28-26:05

[24:28] And this is happening, like, all the time in every lab and every, like, research team. I would say, like, prompting is, like, also a way to, like, prototype, like, new... [24:39] part of good years. Like early days that Andara Glenay was working on like file uploads feature. I remember I was just like, [24:48] you know, prompting the model to just like, [24:52] And even when we were launching 100 key contexts, I was just prototyping this in the local browser. I did the demo, and people really, really loved it. And they just wanted API for file uploads or something. And then that's when it clicked to me. [25:09] I also like wrote a blog post on February. It clicked on me like, prompting is a new way of like, [25:14] product development or prototyping for designers, [25:18] and for like prime measures. For example, one of the features that I wanted to do is like have a personalized, recommended, personalized starter prompts. So whenever you come to a cloud, like it should like. [25:32] recommend you like [25:34] starter prompts based on what your interests are. And so like you can [25:38] Literally do it like [25:40] prompting for that. And another feature was generating titles for the conversations. It's a very small, like, micro-experience, but I'm really proud of the way we did that was we took, like, five latest conversations from the user, like, ask the model, like, what's the style of the user? And then, like, for the next kind of new conversation, the generated title will be

26:05-27:35

[26:05] of the same style. It's just like really little like [26:09] micro experiences like those. That's so cool. Did you do that at Anthropic or at OpenAI? At Anthropic. Okay, cool. I love the file upload feature that Claude has, by the way. Oh, ChatGPT doesn't have that yet. Is that right? [26:21] - I think it has. I think the way it's implemented is very different though. - Okay, maybe it's the PDF feature, 'cause I use it all the time with Collette, okay. - That's cool. - Someone needs to get on that. Man, it's wild how many features you built that I use every day and that many people use every day. [26:35] This prototyping point you made is really important. It's something that comes up a ton on this podcast also of how that [26:40] is maybe the way that AI has most impacted the job of product builders recently is just prototyping. [26:45] instead of going from showing just like here's a PRD, here's a design [26:49] PMs more and more just here's the part of the idea that I have and it's working you can play with it [26:54] Yeah. Yeah. Okay. [26:57] I want to spend a little more time on how you operate. So you talked about, you built this in Launchless Tasks feature. Is that the way you describe your tasks? [27:04] Yeah. So talk about how that emerged and let's better understand just how you collaborate with product teams and how OpenAI works in that way, whatever you can share there. [27:13] I think Canvas and Tasks are going into the bucket of projects where it's like more like short or medium terms. [27:24] Actually, the way Canvas and Tasks [27:27] came up [27:28] about to be was like it started was like one person prototyping [27:33] And, [27:34] creating like

27:35-29:19

[27:35] Thank you. [27:36] Aspac [27:37] It's kind of like PRG. It's like creating a stack of like the behavior of the model. I don't think like... [27:44] tasks is like extremely like [27:47] groundbreaking feature, necessarily. What makes it really cool is [27:53] because the models are so general, model can now search. They can write sci-fi stories. They can search for stocks. They can summarize the news every day. [28:04] because the models are so general, like giving something familiar to people that like, you know, notifications is like very familiar, like having reminders is like very familiar. So like creating like a form factor for the people who like very familiar, same as like Canvas, right? It's like Google Docs is very familiar. But then you add like magical AI moment and it becomes like very powerful. [28:26] But the way it comes-- usually, operationally, [28:29] Yeah, so this is like a prototype, like literally prompted prototype of like how you would want like the model to behave. For like tasks, for example, like you kind of like need to design a little bit like design, design systems, design thinking is like, OK, like, well, if the user says like, remind me to go to lunch like at 8 a.m. tomorrow. OK, what kind of information does the model needs to extract from that prompt in order to create a reminder? [28:59] like design like a stack for a new feature, like a tool. Canvas and tasks are all tools. So it's like, how do you like create the tool stack? And then it's like, mostly like developing JSON schema. I was like, okay, like from this problem, maybe the model should extract like,

29:19-30:57

[29:19] the time that the user requested [29:22] And then you're thinking about which format you want the time to be. And then how do you want the model to [29:29] notify you is like basically [29:33] if the user should give instruction to the model, and then this instruction would fire off every day or something at that particular time. So for example, if you say, [29:44] search, like every day I want to learn now about the latest AI news. [29:51] the model should rewrite into like, [29:53] "Okay, search for the latest AI news, and this task will get fired at that particular time that the user requested." And then, you know, you design this tool spec, and then... Actually, I don't know, I feel like sometimes, like... It's like through conversations, I... [30:12] I know people ask me to join the team and they're like, "Oh my god, we need to be researchers." or like, "We need some support." or "We need to train the models." Or sometimes, with Canvas, I just pitch the idea of like, [30:27] it got staffed quite immediately during the break. So I know it's like depending on the project. And usually with staffing it's like mostly like a product manager, model designer, [30:40] actual product designer, [30:42] a couple of researchers don't advise applied engineers, depends on the complexity of the project. [30:48] And then like, [30:49] For tasks, it took like two months or so to go from zero to one basically. - Oh wow.

30:58-32:31

[30:58] for canvases was like, [31:00] four or five months, I guess. [31:03] to go from zero to one. [31:05] But yeah, and then you teach product managers how to build evils, and maybe [31:11] you know, how do we not only like ship [31:15] the better feature, but how do we think, like, logo term, like, what kind of cool features do you want tasks to have? Like, I think it would be nice for tasks to be, like, a little bit more personalized. It'd be nice to have, like, [31:28] to create tasks via voice and on a mobile right like so you kind of need to like this is how you get like research roadmap right here is like thinking like how the feature will be developed in the future. [31:39] And then from there, it's like, [31:41] Thank you. [31:41] you start creating datasets. Like with EWAS, you wanna make sure that goes well. And then like, [31:50] you need to have a trade-off between what methods you want to use. And the reason why I really love relying purely on synthetic data instead of collecting data from humans is because it's much more scalable. It's cheap, less than how you literally sample from the model, and you teach the core behaviors of the models, and that will generalize to all sorts of things. [32:13] diverse coverage. And when you launch the better feature, you learn so much from the users that you can like, [32:20] all your synthetic sets. [32:22] can be shifted in the distribution of how the users behave and find the product behavior, and this is how you improve. And this is what happens in Canvas too.

32:31-34:01

[32:31] when we launch from beta to GA. [32:34] This episode is brought to you by Loom. Loom lets you record your screen, your camera, and your voice to share video messages easily. Record a Loom and send it out with just a link to gather feedback, add context, or share an update. So now you can delete that novel-length email that you were writing. Instead, you can record your screen and share your message faster. Loom can help you have fewer meetings and make the meetings that you do have much more productive. [33:04] and end early. Problem solved, time saved. We know that everyone isn't a one-take wonder when it comes to recording videos, so Loom comes with easy editing and AI features to help you record once and get back to the work that counts. [33:18] Save time, align your team, stay connected, and get more done with Loom. Now part of Atlassian, the makers of Jira. Try Loom for free today at loom.com slash lenny. That's L-O-O-M dot com slash lenny. [33:34] Something that I want to help people understand, and I don't even 100% understand this, is what's the simplest way to understand the job of a researcher versus, say, a model designer and other folks involved? Like, what's the simplest way to understand what researchers... [33:46] do at OpenAI? So the project that I described, I'm mostly like product-oriented like research, it's mostly like product research. [33:53] Another part component of my team is actually more like longer term exploratory projects. And it's more about like,

34:01-35:34

[34:01] developing new methods, [34:03] understanding those methods and a variety of circumstances. So, [34:08] Like, basically, developing methods [34:11] you kind of need to follow a very similar kind of recipe of building evals, but it's more sophisticated evals. You kind of want to have auto distribution, or if you want to measure generalization, you kind of need to capture that. But it's basically more science-y in a way where [34:30] You know, if we talk about syntax data, like one of the hardest things about syntax data is like, how do you make it like more diverse? [34:37] Diversity in Sydney is one of the most important questions right now. [34:42] exploring ways to inject diversity as a general method that will work for all is one of the research explorations. Other ones is more developing new capabilities. I feel like it's all about, you know, you work on this new method and you have signs of life that it's working. [35:03] Either you think of how do you make it more general, or you think of how do you make it very useful. And this is how longer-term projects become more medium- and short-term projects. That makes sense. Essentially, working on developing ways to make the models smarter, '04 or '05 or '06. Like '01 was a big breakthrough, right? The way it operates, where it's not just here's your answer, it actually thinks and has [35:29] It takes time to think through the process of coming up with an answer. Okay. Yeah.

35:34-37:09

[35:34] Very helpful. [35:35] Speaking of that, of thinking about the future, where things are going, [35:38] I want to spend some time on... [35:40] just this insight that basically you are building the cutting edge of AI, like at the very bleeding edge of where [35:47] AI is going and where it is. And so I'm very curious to hear just your take on [35:53] how you think things are going to change [35:55] in the world. [35:56] and how people work based on where you see things are going. And I know it's a broad question, but let's say like in the next three years, [36:03] How do you see the world changing? How do you see people's way of working changing? [36:08] It's a very humbling experience to be in both labs, I guess. To me, when I first came to Endorbring, I was like, oh, no, I really love frontend engineering. And then the reason why I switched to research is because I realized at that time it's like, oh, my God, like Cloud is getting better at frontends. Cloud is getting better at coding. I think Cloud can develop new apps or something. And so it can develop new features for the things that I'm working with. [36:35] It was kind of like this meta-realization where it's like, oh my god, like... [36:39] the world is actually changing. And they're like, when we first launched 100K Context at that time, [36:46] obviously, you know, I'm thinking about like form factors that's like, yeah, like file uploads were like very natural, very familiar to people, but you could imagine we could just like make like infinite chats in the cloud.ai app, right? Like as if like it's like in a 100k context. But because like file uploads, it's like foreign followers function is like

37:10-38:40

[37:10] the form factor of the file uploads kind of enable people to just like, [37:13] literally upload anything, the books, any reports, financial, and ask any task to the model, [37:22] And then I remember it was like, you know, enterprise customers, like financial customers are like really interested in that. I was like, oh wow, like actually they, [37:33] It's actually one of the... [37:35] very common tasks that people do in that setting. It was like kind of crazy to like see how some of the redundant tasks are getting like automated basically by these like smart models. And they're entering the era where [37:51] I actually don't know, for example, sometimes if a one [37:55] gives me the correct answer or not, because I'm not an expert in that field. And it's like, I don't even know how to verify the outputs of the models, because all my experts know that they can verify this. [38:11] Yes. So basically there are [38:13] Trends that are going on. The first trend is the cost of reasoning and intelligence is drastically going down. I had a blog post about this. [38:24] Maybe I should update them like... [38:26] Latest benchmarks because at that time like MMO everybody was like doing like um, and [38:31] So one benchmark and then we quickly saturated the benchmark. So now we need to do the same flop but with another frontier eval.

38:40-40:13

[38:40] but the cost of intelligence is going down because it becomes much cheaper. Small models are becoming... [38:48] It was smarter than large models, and that's because of the distillation research. This happened with a Clot3 Haiku. [38:58] I was like working on like post-changing all the Cloudy Haiku, and I realized it was much smarter than like Cloud Tool, which was like way, you know, bigger, or something like that. But like the power of like small models become very intelligent and fast and cheap. We are moving towards that world that has like multiple implications, but the news that like... [39:22] people will have more access to AI, and that's really good. Like builders and developers will have much better access to [39:30] AI, but also it means all the work that has been bottlenecked by the intelligence will be unblocked. [39:39] Anyone like I'm thinking about a health care, right? Like if I have instead of going to a doctor, I can like ask Chai GPT or give Chai GPT a list of symptoms and ask me like, oh, which like would I have like a cold flu or like something else? Like I can literally get the access to like a doctor almost. And there's like been some research studies around that. Yeah, there's a New York Times story about that where they compare doctors. [40:09] to doctors using ChatGPT to just ChatGPT and--

40:13-41:45

[40:13] just ChatGPT was the best of them all. Like doctors made it worse. [40:19] Yeah, that's crazy, right? Education, I think... [40:24] I would have friends if I had the tool like Chachi Petey when I was young and I would learn so much. [40:31] People can now learn almost anything from these models. So they can learn new language. They can learn how to build new... [40:39] look up, like, I don't know, anything that you want and like, [40:43] And so... [40:45] It's humbling to have launch Canvas and bring that thing to the people, enable them to do something else that they couldn't have ever before. I think there's something magical around this experience. Education will have massive implications. I guess scientific research, right? I think it's the theme of any AI research is like augmented AI research. It's kind of scary, I'd say. [41:09] Which makes me think that people management will stay. It's one of the hardest things. It's emotional intelligence for the models, creativity in itself is one of the hardest things. [41:24] Writers, I don't think people should be worried as much. I think it's like, I think it alleviates a lot of redundant tasks. [41:32] for people. [41:34] This is awesome. Okay, I want to follow this thread for sure. And it's funny that what you described is like, you were an engineer at Anthropic, and you're like, [41:41] okay claude is going to be very good at engineering this isn't going to be a

41:45-43:15

[41:45] potentially career long term, so I'm going to move into research. [41:48] and AI's gonna need me for a long time to build it, to make it smarter. - I would say we still have, like, I think Canvas team has, still have, like, really cool, like, front-end engineers that are really, like, you know, people who, like, really care about, like, interaction design, like, interacting experience, like, I don't think, like, models are there yet, like, I think, if, but we can get the models to, like, this top 1% of, like, front-end or something, for sure. [42:15] So I want to move on to next along these lines is just, and this is just speculation, but what skills do you think will be most important? [42:23] valuable going forward [42:25] for product teams in particular. So folks are listening and they're like, okay, this is [42:30] Scary. [42:31] What should I be building now to help me [42:34] stay ahead and not be in trouble down the road. [42:38] What skills do you think are going to be more and more important to build? [42:42] Yeah, I think like creative [42:44] thinking, like you kind of want to like come up, like generate a bunch of ideas and like filter through them and not just like build the best product experience. Listening, [42:56] You know, you want to build something that, like, the most general model will not replace you. And oftentimes you build something and you make it really, really good for, like, specific set of users. [43:11] And [43:12] Actually, the moat is now in like...

43:15-44:46

[43:15] your user feedback, the mode is like more in like, [43:19] Whether you listen to them, whether you can rapidly read, [43:24] the mode is like, [43:25] in here, I don't think we are yet to [43:29] There are so many ideas. I think there's an abundance of like ideas that you can look like. I wouldn't be worried. I feel like in fact I do think like people [43:38] NAI fields are like [43:40] I wish they were a little bit more creative, and connecting the dots across different fields or something like that, to develop really cool new, like, [43:49] generation and new paradigms of interactions with this AI. Like, I don't think we've cracked this problem at all. A couple of years ago, I was like telling some people, I was like, you know, you kind of want to like, [44:01] build for the future. So it's like, [44:04] It doesn't necessarily... [44:07] matter whether the model is good or not good right now, but you can build product ideas [44:14] such that by the time the models will be really good, it will work really well. And I think it just happened naturally. For example, at Antarctic, the Claude artifacts, and I feel like early ideas of Canvas was back in 2022, before Chachapiti, writing IDE was kind of all Chachapiti. But I feel like Claude 1.3 model itself was not there to make really extreme, good, high-quality edits, for [44:44] coding. Um,

44:46-46:26

[44:46] And I feel like I see startups like Carcer and it's doing super well. And that's because they iterate so fast, they invent new ways of training models, they move really fast, they listen to users, massive distribution. Yeah, it's kind of cool. That's really helpful, actually. [45:10] that soft skills essentially are going to be more and more important and powerful. You talked about management, leading people, being creative and coming up with innovative insights, listening. [45:20] There's a post I wrote that I'll link to where I try to analyze how AI will impact product management. And we're actually very aligned. [45:29] And my sense was the same thing, that soft skills are going to become more and more important. [45:33] and the things that are going to be replaced as the hard skills, which is interesting because usually people value the hard skills like [45:39] Coding, design. [45:40] writing really well. [45:42] And it's interesting that AI is actually really good at that because it's [45:46] taking a bunch of data, synthesizing it, and... [45:49] writing, creating a thing versus all these fuzzy things around of what influences, convinces people to do things and aligning and, [45:56] listening, like you said, creativity. Anything along those lines come up as I say that? I think it's actually really, really hard to teach the model how to be [46:05] aesthetic or do really good visual design or how to be extremely creative in the way they write. I still think Chagipi kind of sucks at writing. And that's because it's like, it's like bottom-mouthed by this creative reasoning. I think prioritization is one of the most important...

46:26-48:01

[46:26] for a manager, I feel like, actually, like, AI research progress is bottlenecked by, like, [46:31] management, like research management, is because you have like [46:36] constraints out of compute and you need to like allocate the compute to the research path that you feel the most [46:45] convinced about it. It was like you need to have a really high conviction in the research path to put the compute and it's more like return on investment kind of situation. It's like okay, I'm thinking a lot about how to across all my projects, which projects are higher priorities. Prioritization and also on the lower level, which experiments are really important to run right now [47:15] like management, people skills like empathy, like, [47:20] understanding people, like, kind of like collaboration. Like, I think, like, Canvas wouldn't be... [47:25] like an amazing launch if it wasn't like about like people and i think it's a wonderful group of people and like [47:34] I get a chance to work with people like Lee Byron, who's a co-creator of GraphQL, and some of the best Apple designers. And it's so cool to see... [47:44] And how do you create this collaboration between people? It's just something that's still humane, I think. [47:50] Let me just follow us around a little bit because I imagine people listening are like, okay, but once we have AGI or SGI, it's like, it'll do all this. There's a world where like, why isn't all this done?

48:01-49:44

[48:01] I think it's easy to just assume all that. I'm curious, this idea of creativity and listening... [48:07] Why you think AI isn't good at it. [48:10] other than it's just very hard to [48:13] train it [48:14] to do this well. Is there anything there just like why this is especially difficult for AI [48:18] Nell lamps to get good at. [48:20] I think currently it's difficult [48:24] For... [48:25] Many reasons, I think it's still, like, an active, like, research area, and it's something that, like, I think my team is, like, working on. It's, like, okay, like, how do we teach the model to be, like, more creative in, like, the writing? And it's really, like... [48:37] I'm thinking, like, this new paradigm of life, the morals... [48:41] think more should actually lead to better writing in itself. But when it comes down to idea generation or [48:51] Um, [48:52] discriminating of like what is the good like visual design and odd i feel like if hasn't had learned [48:59] Like... [48:59] examples from like people to discriminate very well. I do think it's because like, [49:06] You know, there are not that many people who are, like, actually really, like... [49:11] but it's not like accessible to like models so learn from these people I guess. [49:17] So definitely that's why it's awesome. Yeah, that makes sense. Basically, there's not enough of you yet. Researchers teaching it to do these things slash people that have incredible taste and creativity that can teach these things. You could argue this will come, but I'm not, we don't need to keep going down that thread. Let me ask you a specific question. In this post I wrote, I made this argument that a lot of people disagreed with that strategy is something that AI tooling will become increasingly

49:44-51:15

[49:44] great at and take over. There's the sense that [49:47] That's the thing that people will continue to be much better at. And you can't offload AI, basically developing your strategy, telling you what to do to win. [49:56] My case is... [49:58] Isn't strategy to just take all the inputs, all the data you have available, [50:01] understand the world around you and come up with a plan to win. [50:04] It feels like AI would be, like an LLM would be incredibly smart at this. What's your take? I think so, too. I think, like, again, like, you teach the model all sorts of, like, tools and, like, capabilities and, like, reasoning, right? And it's, like, when it comes down to, like, as for Canvas right now, it would be very cool to, like, for the models just, like, aggregate all the feedback from users. Like, summarize me, like, the top five, like, most painful flows on user experiences. [50:34] of like, [50:35] like thinking of like knowing how it's [50:38] being made, figure out how to create a data set for itself to train on it. And I don't think we are far away from that kind of self-improvement, models becoming self-improved, via, like, [50:54] Then the product development is basically self-improving. It's kind of like its own organism or something. [51:02] Yeah, like, again, like, strategy is, like, it's more, like, data analysis and, like, coming up with, like, [51:12] - I think what models are really good at is like,

51:17-52:48

[51:17] like connecting the dots, I think. It's like, okay, if you have user feedback from this source, but you also have an internal dashboard with metrics, and then you have... [51:29] you know, like other kind of lucky buck, [51:33] Or like... [51:34] Input. [51:35] and then it can co-create a plan for you, recommendations even. And I think this is one of the most common use cases for CheshPT tools, [51:44] coming up with like this sort of things. That makes sense. Like essentially a human can only comprehend so much information at once and look at so much data at once to synthesize takeaways and, [51:54] As you said, these context windows are huge now. Here's all the information. What's the most important thing I should do? Yeah, same as scientific research. Ideally, the model would be able to suggest ideas, new ideas, iterate on the experiment, given the empirical results of the previous experiments, how do you-- [52:14] like, [52:15] come up with like new ideas or like methods yeah oh man [52:20] Okay, so just to close the loop on this conversation, this part of the thread is the skills you're suggesting people... [52:26] focus on building and leaning into soft skills, [52:30] like creativity, [52:32] managing influence, collaboration, [52:35] looking for patterns, [52:37] Is that generally where your mind is at? [52:39] - Yeah, I'm thinking a lot about how do we make organizations more effectively. And I think this is mostly management, I guess. It's like, how do you organize

52:48-54:22

[52:48] research teams or like generally teams like combine [52:52] a closed team such that they will be at their maximally [52:56] succeed at the maximal performance of what can possibly [53:01] We can literally create... [53:03] the next generation of computers. It's just like the matter of conviction and like [53:08] the way you manage to do that, it's like scaling, [53:12] organizations or like scaling product research is. Yeah, I think what like you're basically building [53:19] this thing and [53:20] Not efficiently doing it is like limiting the potential of the human species right now. It's mismanagement within the research team and open ion anthropic and some of these other models. [53:31] Yeah, it's kind of crazy to think about. Holy moly. Okay, so speaking of Anthropic and OpenAI, you've worked at both. Very few people have worked at both companies and have seen how they operate. [53:41] I'm curious just what you've noticed about the differences between these two, how they operate, how they think, how they approach stuff. What can you share along those lines? [53:47] It's more similar than different. Obviously, there is a lot of like... [53:53] There are some differences also comes to like nuances to culture. [53:59] I really love Anthropik, and I have a lot of friends there. And I also love OpenAI, and I still have a lot of friends. So it's not about enemies. I feel like there's, in the eyes, it's all, yeah, the competitors, there's enemies. But it's actually one big community of people doing the same thing. [54:18] I would say what I've learned from Antaprotech is this, like,

54:22-55:55

[54:22] real care and craft towards like model [54:28] behavior, model cost, model training, and [54:32] I've been thinking a lot about what makes Cloud Cloud and what makes Chachapiti Chachapiti, and it comes down to operational processes that [54:42] kind of leads to the outputs [54:44] to the model, the outputted model. And it's like, the reason why Collide has so much more personality and like, [54:52] is more like a librarian. I don't know. I don't know. I'm like visualizing [54:58] a club being like a librarian, like a... very like nerdy or something. It's because I feel like it's a reflection of [55:07] the creators who are making this model and a lot of [55:12] details around the character and the personality and whether the model should follow up on this question or not. Was the correct ethical behavior for the model in this scenario? A lot of crafts. [55:26] and to read it like this. And this is where I learned that part of like, [55:32] art, I guess, [55:34] at Antarctic. I'd say that Antarctic is much smaller. When I joined, it was 70 people. When I left, it was 70 people. Obviously, the culture changed so much. I really enjoyed being early days startup lives and people [55:50] knew each other as a family, but the culture shifted. I would say I learned from Andarvik that

55:56-57:34

[55:56] They're much better at, like, focusing and, like, prioritization. Or, like, very, very hard... Like, very hardcore prioritization, I guess. And they need to do it. Like... But I think, like, OpenAI is, like, much more... [56:07] innovative and much more risk takers in terms of products or research. [56:15] You know, I [56:16] I don't know, your full-time job can be just teaching the model how to be creative writers. And there's some luxury in this research freedom, [56:26] the [56:27] that comes to scale, maybe? I don't know. [56:30] but it gives you [56:32] I feel like I have much more creative product freedom to do almost anything, I guess, within OpenAI. I've lost Chachypt into the version. It's more like, yeah, probably bottoms up, I guess. Yeah, that's how I was thinking about it. It feels like OpenAI is more bottoms up. [56:51] distributed people bubble up ideas, try stuff. There's more [56:55] And that leads to more products launching, I imagine, more things just kind of being tried versus more of a, let's just make sure everything we do is awesome and great and craft and... [57:05] thinking deeply about every investment. [57:08] That's really interesting. I've never heard it described this way. [57:11] Karina, we've covered so much ground. This is going to help a lot of people with so many [57:15] ways of thinking about where the future's going. [57:18] Before we get to our very exciting lighting round, I'm curious if there's anything else that you think might be helpful to share or get into. [57:23] one of my regrets i guess when i was early days at on top it was that like i think there was like some luxury of the time this pre-chai tp to actually like

57:35-59:09

[57:35] come in with a bunch of ideas and prototypes almost every day. [57:41] And I think we did a lot of cool ideas. Like Cloud and Slack was actually one of the first tool using products. It's like Cloud could operate in your workplace now. It's kind of cool when you add Cloud to summarize the thread. So maybe you have an entire conversation with someone and then you want to a summary of what happened. You can say, add Cloud, summarize this. Also, it was really fun to even iterate on the model itself. [58:11] just like talk to the model in like Slack forever. [58:14] It created some social element that's kind of cool. It's kind of like me joining this Discord. People learned so much about prompting and how to work with Cloud. Actually, one of the features that was early tasks part of time was every Monday, Cloud would just summarize the entire channel. [58:36] Or every Friday we just summarize a bunch of channels and give [58:43] the news about the organization or something. So it's really cool form factor. I think I'm thinking about form factors. [58:53] a really important question in AI. Especially we haven't even figured out how to be [58:59] create an awesome product experience with O-series models. It's the paradigm between synchronous, real time, give an answer,

59:09-1:00:41

[59:09] paradigm into like more asynchronous paradigm of like [59:13] agents working on the background, but then now the question is like, the agents should build trust with you, right? And trust builds over time, which is like with humans. And, um, [59:24] you saw this collaboration, which is why this collaboration model was like, you and the model is so important, because you both trust and the model learns from your preferences so that it can become more personalized. [59:38] and it will start predicting the next action that you want to take on the computer or something. And it's kind of more predictive, much more... we went from personal computer to personal model, basically here. [59:53] Why is it not a thing? That seems like such an obvious feature that every... [59:57] LM should have is a Slack bot version of them. Is that a thing I can have you install or is that not a thing right now? [1:00:02] I know that Cloud and Sly were sunsetted in 2023 or something, but that's because I think... [1:00:09] I think it was after Chai Chippity, it was mostly the focus on consumer use cases or enterprise use cases. I think we didn't want... [1:00:19] I think the form factor of cloud and Slack is like, was kind of constrained a little bit, when you want to develop new features. - No, I want that. [1:00:30] I know that JGP had Slack bar too, so I don't know. Maybe it will come back. All right. I would pay for that. [1:00:36] Any other memories from that time of early days? Because that's a really special place to have been.

1:00:42-1:02:27

[1:00:42] as early days anthropic. Any other memories or stories from that time that might be interesting to share? [1:00:47] I think the very first launch when it felt like... [1:00:51] When Clip Sim Use, again, was like a 100k context launch is when the models could input the entire book and give you a summary of the book or something. [1:01:04] or the entire financial, or like have like multi-files, financial reports, and then like give you an answer to the question. [1:01:13] to a very specific question. I think there was something in there that kind of like, oh my god, this is like a really cool [1:01:19] new capability not like model capability but more like [1:01:24] The key to both is that [1:01:26] came from the product form factor itself, rather than like, [1:01:30] the model capability as much. [1:01:33] I think like other prototypes that we [1:01:38] Thinking about like, yeah, like [1:01:41] There was one part about Claude workspaces, and it's kind of the same idea of Claude and I would have this shared workspace, and that shared workspace is a document, and we can get it written in the document. And I feel like sometimes the ideas, private ideas, lag, and they lag for two years, just in this case. [1:02:00] It's interesting there are these milestones that kind of... [1:02:03] open up our view of what is happening and where things are going. ChatGPT, I think, was the first of just like, wow, this is [1:02:10] much better than I would have thought. You talked about 100k context windows where you could upload a book and ask you questions, have it summarized. I actually use that all the time when I have interview guests and they wrote a book. I sometimes don't have time to read the whole book. So I use it to help me understand what the most interesting parts are. And then I actually dive into the book just to be clear.

1:02:27-1:03:57

[1:02:27] Yeah. [1:02:28] And then, I don't know, maybe like voice was another one where you could talk to, say, ChatGPT. Is there any other moments there that you're like, wow, this is much better than I thought it was going to be? Yeah, I think like the computer use agents like the model operating the desktop. And you can essentially think of like... [1:02:51] you know, new kind of like... [1:02:54] experience where the model can learn the way you browse. And from that preference, it can just browse as just like you. And it's kind of like a simulated persona. And it's actually very similar to the idea of, OK, [1:03:10] Maybe. [1:03:12] Sam Altman doesn't have a lot of time, maybe I want to talk to his simulation and ask, for example, I really appreciate some of the technical mentorship from Jakob, but he doesn't have a lot of time. So I really want to ask him these questions, like how can you respond to simulated environments like this? It would be really cool. That's a great place to plug Lennybot. I have one of those. It's trained on all of my podcasts and newsletters. [1:03:42] And it sits on many models. I don't know which one exactly they use, but it's exactly that. [1:03:48] And it's not even me, it's [1:03:50] All the guests that have been on the podcast on the newsletter that I wrote, and you could just ask it, how do I grow my product? [1:03:55] How do I develop a strategy? And it's actually shockingly good.

1:03:57-1:05:33

[1:03:57] Do you feel like it reflects who you are? Yeah. Like, what are they? The best part of it is you can talk to it. It's built. There's an 11 Labs voice version that's trained on my voice on this podcast, and it's actually very good. [1:04:11] And people like have told me they sit there for hours talking to it. Wow. And somebody, uh, [1:04:17] Told it. Interview me like I am on Lenny's podcast. Ask me questions about my career. And he did a half hour podcast episode with Lenny. Oh, my God. That's so fun. [1:04:27] It's incredible. Future is wild. [1:04:29] Yeah, I think content transformation is like, you know, like I would imagine sometime like, you know, when you generate a sci-fi story, [1:04:39] in Canvas, you can transform this into audio blog, where you have very natural content transformations from one media to another media. [1:04:49] I think one of my [1:04:51] Earl's inspiration is one of the last episodes of Westworld, where [1:05:00] I don't know, but where Dolores comes to her work at the time and she comes to like [1:05:05] There's like... [1:05:06] new workspace and she starts writing a story. And then as she writes a story, a 3D virtual reality starts creating on the fly. So I kind of want to... [1:05:19] hate that. [1:05:22] Kind of cool. [1:05:23] Wow. [1:05:24] Speaking of medium, I'm [1:05:26] I guess I was wondering if I should go in this direction or not, but real quick. Kevin Weil slash Kevin Wheel, I don't know exactly how to pronounce his last name.

1:05:34-1:07:08

[1:05:34] the CPO of OpenAI. Is it a while or a while? [1:05:38] I think Reel. Reel, okay, okay, let's just say that. Reel, I know. He did a panel at the Millennium Friends Summit last year, and he made this really fascinating point that chat is a really interesting interface. [1:05:51] for these tools because [1:05:53] They're just getting smarter and smarter, smarter, And Chad continues to work. [1:05:57] as a paradigm to interact with them. [1:06:00] Similar to a human. You could talk to Albert Einstein, you could talk to someone not very smart, and it's all conversation still. And so it's a really flexible way to interact with increasingly good intelligence. [1:06:11] at some point it'll not be so great. And you were talking about all these ways that [1:06:15] you're adding additional ways to interact. But it's interesting chat proved to be a really powerful layer on top of all the stuff. Yeah, that's really cool. I feel like chat also has a social element, which is very humane. It's like, you know, you sometimes want to get into group chat and having conversations with the AI is kind of like a group chat in itself. It's like messaging. Actually, this idea of how do you build features like this? I see tasks as like, [1:06:42] this like [1:06:43] um general kind of like feature that will scale very nicely as the models would develop like new capabilities as well. It's like like the models will be able to like do better like searches and like you know create new like come up with like more creative like writing on like render you know react apps and like html like apps and like you can have like every day a new puzzle for you like

1:07:13-1:08:45

[1:07:13] Okay. [1:07:13] You mentioned something as we were getting into this extra section that we ended up going down is... [1:07:19] This idea of the agent's using a computer. [1:07:22] I know this is actually something you are going to launch today, the day we're recording it, which will be out by the time this comes out. Call Operator. Can you talk about this very cool feature that people will have access to? [1:07:32] Yeah, so I unfortunately did not work on that, but I'm really, really excited about this launch. It's basically an agent that can... [1:07:45] complete the task in its own virtual computer, in its own virtual environment. You can do any literally tasks like order me, [1:07:54] a book on Amazon and then ideally as a model, [1:07:58] will either follow up with you, which book do you want, or know you so well that you'll start recommending, like, oh, here's the five books that I might recommend to you to buy, and then you... [1:08:10] hit, like, yeah, help me buy, and then the model goes off into its own virtual little browser and, like, complete the task and buy the book on Amazon. And then if you give the model, like, credentials, credit cards, obviously it comes with, like, a lot of trust and, like, safety. [1:08:30] then it will just complete. [1:08:32] the thing for you. It's a virtual assistant. It's interesting how this just sounds like obviously this should happen. Like, why is this not yet a thing? Which is also mind-blowing that [1:08:42] We're just assuming this should exist.

1:08:46-1:10:30

[1:08:46] AI doing things for you on a computer. [1:08:49] that you just asked it to do. It's absurd. It's actually really hard. And I think like, [1:08:57] You're still cracking this, but I feel like... I don't know if you use like Tuple, it's like a pair programming. [1:09:02] No. [1:09:04] But I don't know if you love pair programming, so if you use-- - Oh yeah, Shopify uses this. I remember it came up on a podcast episode. - Oh nice. Yeah, so it's a very cool product, where you can just call anyone, [1:09:15] at any time and then like share a screen and the other person can like have access to the stream or like start like [1:09:21] literally operating your computer. And it's very, like, real-time, like, the allegiance is, like, very... [1:09:29] Yeah. [1:09:30] It was like very high quality. [1:09:33] And it's just like, I kind of want the same. It's like, I want to pair program with my model. And the model should even talk to me, like draw like very specific section in my code, in VS code. [1:09:45] tell me like, I will teach me and you can have like different modes. It's like right here. It was like, [1:09:50] a product right here for you. I don't know. [1:09:55] Some people should build up [1:09:57] Sounds like a startup just got birthed. Yes. From someone listening to this. [1:10:01] You mentioned that it's very hard to do this agent controlling a computer as you and helping out. What makes it so hard for whatever, however much you can explain briefly? Much of it is like, because right now the models are operating on like pixels instead of like language or whatnot. Like pixels is actually really, really hard for the models because like perception or visual perception. I think there's still like a lot of like multimodal research that's going on.

1:10:31-1:12:14

[1:10:31] But I think language scaled so much easier compared to multimodals because of that. [1:10:38] Another thing that... [1:10:40] I just like my team is working on that is like, how do you derive human intent? [1:10:45] very correctly. Sometimes, does the model know enough information to ask a follow-up question or to complete the task? You kind of don't want an agent to go off for 10 minutes and then come back with an answer that you didn't even want. That actually creates much [1:11:04] more worth user experience. [1:11:07] And this comes with teaching the model people skills. [1:11:14] what do people like like [1:11:15] kind of like creating like the mental model of [1:11:19] the user and like care about the user in order to ask certain questions like [1:11:24] actually [1:11:25] That part is hard for the models. That relates to what we talked about earlier, where this kind of the soft skill people skills pieces, not where these models are strong yet. [1:11:35] Okay. [1:11:36] I'm going to skip the lightning round. I want to ask just one question from lightning round. Something fun. [1:11:43] Okay, so when AI replaces your job, Karina... [1:11:46] I'm curious what you're getting. And it gives you a stipend, gives you a monthly stipend. Here's your salary for the month. [1:11:51] What would you want to do? What do you want to spend your time on? What will you be doing in this future world? [1:11:56] I've been thinking about this... [1:11:59] Other times I have a lot of jobs options. I would love to be a writer, I think. I think that would be super cool. Just like write short stories, sci-fi stories.

1:12:14-1:13:45

[1:12:14] No. [1:12:16] I really like art history. So, you know, it's like... [1:12:20] conservationists in the museums, which is like [1:12:24] try to preserve like art paintings but just like painting through a lot of things. I think that'd be really cool to do. [1:12:33] Um, [1:12:34] Thank you. [1:12:35] Yeah. [1:12:36] That sounds beautiful. [1:12:37] I don't know. [1:12:39] What I'm hearing is you need to nerf these models to not get very good at writing. [1:12:43] So that you can continue. Although at that point, you don't need to do it for like you don't need people to buy. You're just doing it for fun. So it doesn't even matter if they're incredibly good at writing. [1:12:51] or art conservation, [1:12:53] Oh, man, what an episode of our conversation. What a wild time we're living in. [1:12:57] Karina, thank you so much for being here. Two final questions. Where can folks find you online if they want to reach out and follow up on anything? And how can listeners be useful to you? You can find me. I'm on Twitter. It's Karina Rien. You can also shoot me at email. [1:13:12] on my website and [1:13:15] My team is hiring, and so I'm looking for research engineers, research scientists, as well as [1:13:23] machine learning engineers, like people who come from like part engineers who want to like learn, like model training. I'm actually hiring for like my team. My team is called Frontier Product Research and the train models, we develop new [1:13:35] methods but for product-oriented outcomes. What a place to work, holy moly. What's the best way for people to apply for these very lucrative roles?

1:13:45-1:14:29

[1:13:45] I think you can shoot me a DM on Twitter or... [1:13:51] I'm yet to create a job description. - Okay, this is the job description. - Or you can apply into post-training team. [1:13:57] Okay, you're going to get a flood of DMs. I hope you're prepared. [1:14:00] Karina, thank you so much for being here. This was incredible. Thank you so much, Lenny. [1:14:04] Bye, everyone. [1:14:16] Also, please consider giving us a rating or leaving a review, as that really helps other listeners find the podcast. [1:14:22] You can find all past episodes or learn more about the show. [1:14:25] at lennyspodcast.com. [1:14:28] See you in the next episode.

Want to learn more?