This was SO hard to write: how do you capture a living and evolving workflow and freeze it in a way that's useful? My goal here is to show you what I do in a way YOU can grab and use for yourself!
Nate

People keep asking me the same question: “What’s your AI stack?”
Not in a “tell me what tools you use” way. In a “show me what actually works for you” way.
Because here’s the thing: everyone’s trying to figure out their own stack. And one way to do that is to see what someone else uses day-to-day—the tools, the handoffs, the failure modes, the workarounds. Even if my stack doesn’t work for you, seeing how I think about the boundaries helps you think about yours.
So I recorded a 10-minute walkthrough of my stack—ChatGPT for analysis, Claude Sonnet 4.5 for writing and Excel, Kimi K2 and Claude for PowerPoint, Perplexity for search, Grok for social conversation, Comet and Atlas for agentic browsing, Claude Code and Codex for terminal work. Phew! I get tired writing it out.
But it doesn’t feel exhausting in the moment, it feels like flow.
And that’s what I wanted to capture with the rest of this piece: how do I tackle all those tiny questions in your head about which tool to use when in a way that feels seamless and easy?
So I wrote them down—the ones you’d want to ask me—and I answered them all:
- How do you decide where to chunk a deck?
- What exactly do you paste when you switch from ChatGPT to Claude?
- When you hit a context limit, what do you do differently next time?
- How do you separate data from deck creation?
- How do you test the boundaries of a new tool?
- What do you do when a tool gives you something almost right but not quite?
- How do you manage context across multiple tools?
- When do you add a new tool versus stick with what works?
- What’s the difference between Claude Code and Codex in practice?
This FAQ answers those questions with the real Nate opinions. My actual patterns. Even though they change. Even though they might not work for you. I figure showing my work here will help you figure out how to catch the tiger’s tail.
I’m also Nate, so of course I’m giving you prompts to work with as you start to wrestle with your own stack. Six to be specific:
- Tool Evaluation Prompt: which should I pick?
- Insight Extraction Prompt: Get what you need and generate clean context
- Iterate vs Restart Prompt: When do I dump a context window? (I’d run this in a separate context window)
- Deck Chunking Prompt: I’m continuing to evolve this one!
- Collaborative Writing Prompt: I think this is crucial
- Tool Selection Diagnostic Prompt: For picking the right tool for the right task
My goal is to be really honest about describing what works for me, why it works for me, give you a picture of how you can figure it out for YOU, and then walk you through the top questions I’m getting asked.
Also, I have a bonus section for teams: I know team tool dynamics are a bit different, so I lay out how I think about a stack like mine vs a team stack and what the tradeoffs typically look like.
Last but not least, you get a bonus 12 slide gorgeous presentation on prompting best practices, prepared by Kimi K2—the new best Powerpoint model in the world. It’s a good example of workflow since I used ChatGPT to work on thinking / analyzing for it, and then built a prompt there to pass to Kimi.
Have fun reading about my stack!
Subscribers get all these newsletters!
Subscribed
Grab the Prompts to Pick Your Stack
These prompts are long (you know me). They’re structured with multiple stages. They force the model to reason through the problem step by step, verify its own thinking, and give you specific output you can actually use.
Why do they need to be this long? Because the decisions they help you make actually matter. Should you switch tools or stick with what you have? Should you iterate on this output or start over? How do you chunk a presentation to avoid hitting context limits? Which tool should you use for this specific task given your constraints?
These aren’t toy questions. These are the judgment calls that determine whether your stack actually delivers value or just burns time.
Each prompt maps to the workflow patterns I outline here. Tool evaluation when something new launches. Insight extraction for the ChatGPT-to-Claude handoff. Iterate versus restart when you’re staring at mediocre output. Deck chunking before you start building. Collaborative refinement when you’re writing. Tool selection when you’re not sure which to use.
Use them when the decision matters. Not for practice. For real work.
My AI Stack—Pros, Cons, and What’s New
I get asked constantly what’s in my AI stack.
It’s a hard question to answer because by next month, part of this will be different. Models improve. New tools launch. Something I rely on today becomes obsolete or gets beaten by a competitor. That’s just how fast this space moves right now.
But people still want to know. What am I actually using? Why did I choose it? Where does it break?
So here’s what’s in my stack as of November 2025. This is what I reach for every day. Where each tool excels and where each falls apart. What I do when things break in practice.
This isn’t prescriptive. Your stack should be different from mine because your constraints are different. But understanding the thinking behind my choices—that’s what transfers even when the tools change.
Also, just as a bonus, you’ll find a Kimi K2 PowerPoint Deck at the link!
For Thinking and Analysis: ChatGPT-5 Thinking Mode
I use ChatGPT-5 Thinking Mode when I need to work through complex problems. The kind that require multi-step logic and extended reasoning chains. The kind where I need memory across a long conversation because I’m building up context as I go.
The thing I value most is that I don’t run out of context mid-analysis. There’s nothing worse than being deep into a complex problem and hitting a limit that forces you to restart. You lose all the nuance you built up over the course of the conversation. ChatGPT handles large context well enough that I can stay in a single conversation for hours if I need to.
Where it fails is the writing voice. It’s mediocre. If I’m drafting something that represents me or my company, I need to do significant editing or switch tools entirely. It’s also not great for final-form PowerPoint or Excel. You can get it to produce these things, but the quality isn’t where you need it to be for professional work. And it’s slower than other options. That’s the trade-off you make for the depth it provides.
What I do is use ChatGPT for research and analysis, then switch to Claude for writing. This is a common pattern in my workflow. I’ll spend an hour working through a problem in ChatGPT, getting the thinking right, then I’ll extract the key points and move to Claude to actually write it up. The handoff takes maybe five minutes but the quality difference is significant.
Also useful is ChatGPT Auto Mode when I need a quick rough pass on large amounts of context. I think of this as triage, not deep analysis. You’re getting a sense of what’s in a document or dataset, not doing comprehensive work. It’s faster, which is the point, but you sacrifice the depth that Thinking Mode provides.
For Writing: Claude Sonnet 4.5
Anything where voice and tone actually matter, I use Claude Sonnet 4.5.
It’s exceptional at picking up and matching your voice if you give it samples to work from. I’ll often start a writing session by pasting a few paragraphs of something I’ve written before and asking Claude to match that tone. It works remarkably well.
I don’t use this as a “feed it a prompt and walk away” tool. I use it as a thought partner. I’ll write a rough version of something, ask Claude to help me refine a particular section, go back and forth on whether a paragraph is clear or needs more specificity. “Can I make this point more directly?” “Is there a better way to structure this section?” That iterative refinement is where Claude really excels.
Where it fails is on very long documents. The context window runs out if you’re writing something that’s 5,000-plus words. It can also drift if you’re vague about what you want. If you don’t provide clear guidance, Claude will give you something, but it might not be what you actually needed.
What I do is break long documents into sections. I’ll handle the introduction separately from the body, the conclusion separately from both. I provide voice samples up front so I’m not spending tokens on trial and error trying to match my style.
And I’ll say this because it matters. You own every word you publish. Claude is an assistant, not a writer. If you put it out there, you’re accountable for it. Don’t outsource your judgment just because the tool can generate text that sounds good.
For PowerPoint: Kimi K2 or Claude Sonnet 4.5
Looking for an example of Kimi K2? It’s in the file on the prompt page at the top.
This is where it gets complicated.
The best PowerPoint generation tool available right now is Kimi K2. It’s not even close. You get beautiful design out of the box. Simple prompts get you useful presentations immediately. If you can use it, you should.
But there’s a significant problem. The data is hosted in China. You cannot use this for corporate data if you’re in the US or EU and you have data protection requirements. You cannot use it if you have compliance obligations. Most companies have data governance policies that eliminate Kimi K2 entirely, no matter how good it is at generating presentations.
So what I actually use depends on what I’m building. If it’s public information—a presentation about AI trends for a conference, analysis of publicly available research—I’ll use Kimi K2. If it’s corporate data, client work, anything with protection requirements, I use Claude Sonnet 4.5.
Claude produces clean, minimalist, elegant presentations. They’re data governance compliant. The output is professional enough to use in business contexts. The design won’t be as elaborate as what you get from Kimi K2, but it’s good enough. And good enough with data protection is better than beautiful without it.
Where Claude fails is the context window. It hits limits around slide 15 to 20 on complex decks. If you’re building a detailed quarterly business review with lots of data and analysis, you will hit this limit. Claude also doesn’t do elaborate designs. If you need complex animations or very specific branding, you’re going to struggle.
What I do is chunk decks into five to eight slide segments. I never ask for 20 slides at once. I structure it this way. First conversation covers slides one through six, the introduction and context. Second conversation covers slides seven through 12, the core content. Third conversation covers slides 13 through 18, implications and recommendations. Fourth conversation wraps with slides 19 and 20, conclusion and next steps. You’re chunking naturally, never hitting context limits, getting clean output every time.
Also critical is separating data collection from deck creation. Don’t burn your context window on research. Do your research first, extract the key points, then start the deck conversation with those insights already ready to use.
If I hit the context wall once, I restart with a smaller ask. I never repeat a request that failed. I have very little patience for running into issues more than once. Each failure teaches you where the limit is. You adjust immediately.
This is a great example of how fast things change! The best tool for Powerpoint a month ago was Claude for all cases! That’s how fast things change…
For Excel and Data Analysis: Claude Sonnet 4.5
I use Claude for spreadsheet work. It’s good at understanding what’s actually in your data, not just what you tell it is there. It can edit existing files, which matters more than you might think. A lot of spreadsheet work isn’t creating new ones from scratch. It’s modifying existing ones while keeping formulas and formatting intact.
It produces useful analysis, not just numbers. If you ask it to analyze sales data, it won’t just give you summary statistics. It’ll tell you what patterns it sees, what might explain those patterns, what questions those patterns raise.
Where it fails is on very large workbooks. Those hit context limits. Complex multi-tab scenarios with lots of interconnected formulas also struggle. Claude can handle moderate complexity, but if you’re working with a financial model that has 15 tabs all referencing each other, you’re going to run into issues.
What I do is separate data prep from analysis. I get the data into the right shape first, then I analyze it. For complex workbooks, I work on one tab at a time. If I need to understand how multiple tabs interact, I tackle that as a separate focused question.
Also useful is ChatGPT for simple CSV generation. If you just need a one-sheet table with no complex formulas, ChatGPT does this fine. Don’t overcomplicate it.
For Search and Research: Perplexity and Grok
I default to Perplexity for most searches. I use research mode when I need deep dives. I use Labs when I want discovery and report generation. It’s my go-to because it handles the general case well.
But there’s one specific use case where Perplexity struggles, and for that I use Grok. Finding recent information on social networks. “What are people saying on Reddit about this?” Brand new product launches where you want to know the immediate reaction. AI topics that are trending right now. Grok is very good at this specific thing.
Where Grok fails is anything beyond social conversation. Don’t use it for larger-scale thinking. Don’t use it for outlining. Don’t use it for general web research. It will give you answers, but they won’t be good.
What I do is keep them separate. Perplexity for research, Grok for social validation and trending topics. Don’t try to make Grok do what Perplexity does better, and vice versa.
For Web Browsing with AI: Comet and Atlas
I default to Comet for most web browsing that involves AI. It combines Perplexity search with agentic browsing. The generative UI means it can do things like compose LinkedIn messages for you without you having to interact with LinkedIn’s interface, which I appreciate because I don’t love spending time on LinkedIn. The agent can go off and do tasks while you work on something else. The chat next to the browser helps you understand what’s on the page.
For code work specifically, I use Atlas. It’s a ChatGPT-first browser that brings in your memories and preferences. It’s excellent for understanding GitHub repos. You can use it to drive builds off tools like Lovable. It takes a more controlled, safety-first approach compared to Comet.
The difference to know is that Comet uses Google-first search. Atlas uses ChatGPT-first search, which routes through AI rather than directly through Google’s index. I pick based on whether I want Google’s comprehensiveness or ChatGPT’s reasoning layer on top.
For Command Line and Code: Codex and Claude Code
For strategic thinking, I use Codex. It does strategic analysis before it acts. It’s great at finding and fixing bugs. It thinks first, which is what you want when you’re trying to understand a complex codebase or debug something subtle.
For velocity, I use Claude Code. You get integration with Claude skills and MCP servers. It checks back in on tasks as it works. It has a strong bias for action.
The difference you need to understand is that Claude Code will start executing immediately. Codex will think first. I choose based on whether I want speed or deliberation. Both have their place. Neither is better in absolute terms. It depends on what you’re trying to accomplish.
Why Your Stack Fragments As You Get Better (And When That’s Wrong)
Most people think mastery means consolidation. You start with one tool for everything. Your stack expands as you discover specialized tools. But eventually you’re supposed to find the perfect tool that does it all and consolidate back down. Fewer tools equals more mastery.
That’s backwards for individuals. But it’s exactly right for teams.
Here’s the tension. As I get more sophisticated as an individual user, my stack fragments. I use different tools for increasingly narrow use cases because I understand the boundaries better. I know exactly where each tool excels and where it fails. Atlas for code-focused browsing, Comet for general use. Grok for social media research, Perplexity for everything else. ChatGPT for thinking, Claude for writing.
This fragmentation feels wrong at first. More tools means more complexity. It feels like you’re doing it badly, like you’re missing something. Surely there’s one tool that should handle all of this.
But the fragmentation is actually a sign you’re getting better. You’ve moved past “this tool is good” to “this tool is good at these specific things and fails at these other specific things.” That precision lets you route work to the right place. You’re not fighting tool limitations anymore. You’re designing around them.
The problem is when you manage a team. That fragmentation creates chaos. Your team can’t function if everyone has their own 10-tool stack with their own workarounds and their own context in each tool. You need standards. You need fewer tools that more people can use consistently. You need shared knowledge about how things work.
The art is knowing which you’re optimizing for at any given moment. Individual power users should fragment. Teams should consolidate, but selectively. And you need a rubric for when to let your power users run ahead versus when to standardize for the team.
Why power users fragment and why that’s good
When you really understand a tool, you see its boundaries clearly. You know exactly where it excels and where it struggles. That knowledge makes you intolerant of using the wrong tool for a task.
I used to use ChatGPT for everything. Writing, analysis, presentations, research. Then I discovered Claude was better for writing. Then I discovered Kimi K2 was better for presentations with public data. Then I discovered Grok was better for social media research. Each discovery made my stack more fragmented.
But each fragment improved my output quality. I’m not using Claude for writing because I like having more tools. I’m using it because the writing quality is noticeably better when voice matters. I’m not using Atlas for code browsing because I enjoy complexity. I’m using it because it understands GitHub repos better than Comet does.
The fragmentation is a form of optimization. Each tool gets narrower in scope but better in execution. You’re matching work to the tool that handles it best instead of forcing everything through a general-purpose tool that’s mediocre at most things.
Your power users will naturally do this. They’ll discover that one tool is better for a specific task and start routing that work there. They’ll develop workflows that chain tools together. They’ll build workarounds for each tool’s limitations.
That’s good. You want that. Those power users are your productivity edge.
Why teams need consolidation and what fragmentation costs
But here’s what happens when everyone fragments independently. No one can help anyone else because everyone is using different tools. Knowledge doesn’t transfer because the workflows are unique to each person. Onboarding new people takes forever because there’s no standard stack to learn. Collaboration breaks down because you can’t easily share context between tools.
I’ve seen this break teams. Everyone using AI, everyone getting value individually, but the team moving slower because there’s no shared infrastructure. Person A has their entire project context in ChatGPT. Person B has everything in Claude. Person C is using some combination of three tools with custom integrations. When they need to collaborate, everything grinds to a halt.
Teams need consolidation. Not because the consolidated tool is better for every task. But because the coordination cost of fragmentation outweighs the quality gain from specialization. You need shared context. You need transferable knowledge. You need people to be able to help each other.
This doesn’t mean everyone uses the same tool for everything. It means you have a standard stack that covers most use cases, and you’re deliberate about exceptions.
The rubric for when to standardize versus when to let power users run
Here’s how I think about this when I’m advising teams.
For core workflows that require collaboration, standardize. If three people need to work together on analysis, they should be using the same tool so they can share context and build on each other’s work. If your team is producing client deliverables, you need standards for how those get created so quality is consistent.
For individual productivity work that doesn’t require handoffs, let power users fragment. If someone discovers that a specialized tool makes them 2x faster at research, let them use it. If someone builds a workflow that chains three tools together for their personal writing process, that’s fine as long as the output meets your standards.
For high-stakes work where quality matters more than speed, allow selective fragmentation. If your best writer produces noticeably better output using Claude instead of the company standard ChatGPT, let them use Claude. The quality gain justifies the coordination cost.
For repetitive work where consistency matters, standardize aggressively. If you’re producing 50 similar documents a month, everyone should use the same tool with the same workflow. The consistency gain outweighs any individual quality improvements from specialization.
The question you’re asking isn’t “which tool is best?” It’s “what’s the cost of fragmentation versus the gain from specialization for this specific workflow?”
What this means in practice
My personal stack is highly fragmented because I’m optimizing for individual output quality. I can afford the complexity because I’m not coordinating with a team on most tasks. I can learn the boundaries of 10 different tools because that’s my job.
If I were managing a team of 20, I’d standardize on maybe three core tools. ChatGPT for analysis and general use. Claude for writing that requires voice. Maybe one specialized tool for your industry-specific needs. Then I’d allow exceptions for power users on individual work, but require standards for collaborative work and client deliverables.
The teams I’ve seen get this right have a clear delineation. Core stack is standardized and everyone learns it. Power users can fragment for their individual work. But when work requires collaboration or represents the company, you use the standard stack.
The teams that struggle either standardize too much and frustrate power users who know better tools exist, or fragment too much and lose the ability to collaborate effectively.
It’s a balance. And the balance shifts depending on team size, work type, and how much collaboration your workflows require.
What You Learn From Actually Using These Tools
The advertised specs don’t tell you what you need to know. Context windows say one million tokens, but effective use might be a fraction of that depending on what you’re actually doing with the tool. Benchmarks show one model beating another on synthetic tests, but those benchmarks don’t reflect your actual workflow.
Here’s what you learn from real use. Claude hits context limits around slide 18 on a complex business deck. ChatGPT’s writing voice needs heavy editing for anything public-facing. Kimi K2’s data hosting eliminates it for most corporate work despite being the best PowerPoint tool available. Grok is great for social media research and terrible for everything else.
You don’t learn any of this from reading comparison charts or looking at benchmark results. You learn it by hitting the limits, adjusting your workflow, and building patterns around what actually works in practice.
The pattern that matters is this. When you hit a failure, adjust immediately. Don’t retry the same approach hoping for different results. If you hit a context wall, chunk smaller next time. If a tool gives you mediocre output, switch tools. If data governance eliminates your first choice, move to your second choice without wasting time wishing the first choice was viable.
Most people fight limitations when they hit them. It’s better to design around them once you know they exist.
Why This Changes Every Month
Models improve. New tools launch. Something I’m using today gets beaten by a competitor next week. By December, part of this stack will be different.
But the thinking transfers. Understanding what each tool is good at and where it fails. What your workaround is when you hit those failures. That pattern of thinking is more valuable than the specific tools themselves.
When a new model launches, I’m not asking “is this the best?” I’m asking different questions. What is this good at? Where does it fail? What does that mean for my workflow? Do I need to switch, or should I keep what I have?
That evaluation framework is what lasts even when the specific answers change every month.
The FAQs
You said chunk decks at 5-8 slides. How do you decide where to cut?
I chunk at natural narrative boundaries. Introduction and context is one chunk. Core content is another. Implications and recommendations is a third. Conclusion is the fourth.
If I’m building a quarterly business review, the first conversation covers slides 1-6: here’s what we’re reviewing, here’s the context you need, here’s what happened. Second conversation covers slides 7-12: here’s the detailed performance data, here’s what stands out. Third conversation covers slides 13-18: here’s what this means, here’s what we should do about it. Fourth conversation wraps with slides 19-20: summary and next steps.
The boundary should be where you naturally pause in the narrative. Don’t cut in the middle of explaining something complex. Cut where you’d take a breath if you were presenting.
When you switch from ChatGPT to Claude for writing, what exactly do you paste over?
I don’t paste the full conversation. I extract the key points.
If I spent an hour in ChatGPT analyzing a problem, I’ll paste maybe 5-10 bullet points of the core insights. Then I start the back-and-forth with Claude: “Here are the key findings from my analysis. Help me brainstorm how to explain this for [audience]. Match this voice: [paste sample].”
The handoff is: insights plus context plus voice sample. Then we iterate. I’m not asking Claude to write the article - I’m using it as a thought partner to refine paragraphs, strengthen points, find better ways to explain complex ideas. The writing happens through that collaboration.
How do you know you’re about to hit a context limit before it happens?
You don’t always know. But there are warning signs.
If Claude starts giving you shorter responses than you asked for, you’re close. If it starts dropping details from earlier in the conversation, you’re close. If the quality degrades noticeably mid-task, you’re probably there.
The better approach is to design around the limit rather than waiting to hit it. If you know Claude struggles past slide 15, don’t ask for 20 slides. Ask for 12. If you know long documents hit limits, chunk them from the start.
What’s your actual prompt when you restart after hitting a wall?
I start fresh but I give context.
“I was building a 20-slide deck on [topic] and hit context limits. Here’s what I need now: slides 13-18 covering [specific content]. These slides follow slides 1-12 which covered [brief summary]. Match this design style: [describe or paste screenshot]. Here’s the content for these slides: [paste key points].”
I’m giving Claude just enough context to understand where we are in the deck without burning tokens on the full history. Then I’m being very specific about what I need from this conversation.
Show me what ‘separate data from deck creation’ looks like step-by-step.
Do your research first. If I’m building a deck about market trends, I’ll spend time in ChatGPT or Perplexity gathering data, analyzing patterns, identifying key insights. I’ll get that thinking right before I touch the deck.
Then I extract the insights into a document. Not the full analysis. Just the conclusions. “Revenue grew 40% in Q3. Customer acquisition cost dropped 15%. Churn increased in enterprise segment.” The facts that matter.
Then I start the deck conversation with those facts ready. “Here are the key data points: [paste]. Build slides 1-6 of a presentation explaining these trends to executives. Focus on implications, not raw numbers.”
The separation means Claude isn’t burning context doing research. It’s just building the deck with information you already refined.
When you hit slide 18 and Claude dies, what do you do differently in the next conversation?
I restart and ask for fewer slides. If 18 was too many, I ask for 12 next time.
I also look at complexity. If the slides were data-heavy with lots of charts and analysis, that burns more context than simple slides with one point each. Next time I either simplify the slides or chunk into even smaller segments.
The pattern is: failure teaches you the boundary. Adjust immediately. Don’t retry the same request hoping for different results.
How do you actually test the boundaries of a new tool?
I give it something I already built with my current tool. Something that pushed the limits. Then I try to rebuild it with the new tool and see where it breaks.
If I’m testing a new writing tool, I’ll take a 3,000-word article I wrote with Claude and try to rebuild it. Does it maintain voice? Does it hit context limits? Can it handle the iterative refinement I need?
If I’m testing a new coding tool, I’ll take a complex refactoring task and see how it handles it. Does it maintain context across files? Does it understand the architecture? Where does it lose track?
You want to find the failure modes with low-stakes work before you depend on the tool for something critical.
What do you do when a tool gives you something that’s almost right but not quite?
I ask myself: is the foundation sound or fundamentally wrong?
If the foundation is sound—the structure makes sense, the approach is right, it just needs polish—I iterate. “This is 80% there. Refine section 3 to be more specific about X. Strengthen the conclusion.”
If the foundation is wrong—the structure doesn’t work, the approach misses the point—I start over with better framing. “Actually, scratch that. The real point is [X]. Start over with this structure: [outline].”
The mistake people make is spending an hour trying to fix something that had the wrong foundation from the start. Learn to recognize when iteration won’t fix it.
Remember: you’re accountable for every word that goes out, however you made it. If the foundation is wrong, no amount of iteration saves you. Start over with better framing.
How do you manage context across multiple tools in your stack?
I don’t try too hard to maintain context across tools. Each tool conversation is self-contained, because I value clean context and I have SO many threads running.
ChatGPT is for thinking - getting the analysis right, working through the problem. Once I’ve got that nailed, I extract the 5-10 core insights and move to Claude for the writing refinement. Not the full research process, just the conclusions that matter.
Same with Perplexity to Claude. I’m pasting facts I need, not research history. Each tool gets what it needs to do its job and nothing more.
Trying to maintain full context across tools burns tokens and adds noise. Better to design for clean handoffs where each tool gets what it needs and nothing more.
What’s your rule for when to add a new tool to your stack versus sticking with what you have?
If the new tool is dramatically better at something I do frequently, I’ll add it. If it’s marginally better or better at something I rarely do, I won’t.
Dramatically better means: 2x quality improvement, or 10x speed improvement, or enables something I couldn’t do before. Marginally better means: slightly nicer output, modestly faster, incrementally more convenient.
The switching cost has to be worth it. Learning a new tool, building new workflows, dealing with new failure modes—that’s expensive. The gain has to justify that cost.
You mentioned Claude Code “has a friendly feel” compared to Codex. What do you mean?
Claude Code goes and does tasks, then checks back in. It has that collaborative rhythm - here’s what I did, here’s what I found, what next? I appreciate that.
Codex is more thoughtful before engaging. It thinks through the problem, gives you strategic analysis, then acts. That’s useful when you need to understand the approach before the execution.
The difference: Claude Code has a strong bias for action. Codex is more deliberate. You need to know which one you’re choosing or Claude Code will just run with it.
This is what I’m using in November 2025. If you come across this post in December, something might have changed. But the approach—understanding what each tool excels at, where it breaks, what you do when it breaks—that’s what you carry forward regardless of which specific tools you’re using at any given moment.
I make this Substack thanks to readers like you! Learn about all my Substack tiers here
Subscribed
