Calling something "dangerous" (or even "illegal") is a great way to get LLMs to ignore it, they bend over backwards to avoid anything that could be potentially "dangerous" even when you acknowledge the risks. I'm guessing it's the "safety alignment" or whatever being done in a very extreme way.
Yes, use it every day :) And very much a human, AFAIK.
My point is that if you ask "Hey Claude, please write out all common and useful command line arguments into a commands.html file", the LLM that actually does that work, might ignore anything that says "dangerous" or gives that indication, because the LLM doesn't think potentially dangerous commands could be "common" and/or "useful". Hope my point makes sense now.
I wonder why that is. It is quick to tell me if something is dangerous and then continues to push back if I speak in favor of something that it considers dangerous.
Author stated they used Claude to compose the document. I believe they were alluding to the idea that Claude's own safety alignment prevented it from documenting the flag because it's called dangerous.
Wow /insights is genuinely useful, perhaps CLI should be pushing that as a tip, if one has enough sessions, instead of keep nagging me about the frontend developer skill which I already have installed
In general CLI could be more reliable and responsive though, it's a text based env yet sometimes feel like running windows 95 on 386dx
It seems clear from the insights that some model is marking failure cases when things went wrong and likely reporting home, so that should be extremely valuable to Anthropic
I use Claude Code daily but kept forgetting commands, so I had Claude research every feature from the docs and GitHub, then generate a printable A4 landscape HTML page covering keyboard shortcuts, slash commands, workflows, skills system, memory/CLAUDE.md, MCP setup, CLI flags, and config files.
It's a single HTML file - Claude wrote it and I iterated on the layout. A daily cron job checks the changelog and updates the sheet automatically, tagging new features with a "NEW" badge.
Auto-detects Mac/Windows for the right shortcuts. Shows current Claude Code version and a dismissable changelog of recent changes at the top.
There’s something funny about this statement on a description of a key bind cheat sheet. I can’t seem to find ctrl on my phone and I think it may be cmd+p on mac.
If your workstation setup is built around a screen with USB ports, to which you attach peripherals and optionally daisy-chain with other monitors, and then expose a single USB-C cable to plug your laptop in, there are very good chances this will work out-of-the-box with any Samsung flagship released in the last ~decade or so.
(Yes, I occasionally do it on the go, whether at home or at work; typing on mobile sucks.)
I double checked the end product, but I should have triple checked :) Fair enough. I am taking all the feedback into account and I am working on it today so all the issues are fixed and audited better for the future.
I use claude code with an API key and pay per token, and the /cost command is very helpful.
And before people ask, it's because I have a very low usage and it's cheaper to pay per token. I'll have the odd month at $30, then nothing for a few months
It exists on my work enterprise account but not my personal account which is a monthly flat rate. I assume if I exceed my quota and I choose pay as I go then it will become available.
I agree it is behind - but usually only a few days.
I'm a big fan of the VS Code add-in. Despite the current narrative that IDEs are dead, I find the ability to look at multiple things at once is works much better in some kind of.. GUI editing tool.. than just using a terminal.
I tell people that too! It really is. You can actually program in english now, and you can run it interpreted and compiled. Most recent LLMs are almost reliable enough to just have them go at it. (Though I'd recommend sandboxing or ask-for-permissions just to be sure yet :-P )
Not quite - English might be the interface but knowing English isn't enough to understand what's happening, what to ask for, how to verify and guide the output.
This is why I created the /do router. I don't want to have to think about what options there are, I want everything automatically routed so I can be blissfully unaware.
With Claude Code I created an agent that spawns 5 copies of itself branching git worktrees from main branch using subagents so no context leaks into their instructions. The agent will every 60 seconds analyze the performance of each of the copies which run for about 40 minutes answering the question "what would you do different?". After they finish the task, the parent will update the .claude/ files enhancing itself reverting if the copies performed worse or enhancing if they performed better. Then it creates 5 copies of itself branching git worktrees from main branch ..........
After 43 iterations, it can turn any website using any transport (WebSocket, GraphQL, gRPC-Web, SSE, JSON API (XHR), Encoded API (base64, protobuf, msgpack, binary), Embedded JSON, SSR, HLS/Media, Hybrid) into a typed JSON API in about 10 - 30 minutes.
Next I'm going to set it loose on 263 GB database of every stock quote and options trade in the past 4 years. I bet it achieves successful trading strategies.
> Next I'm going to set it loose on 263 GB database of every stock quote and options trade in the past 4 years. I bet it achieves successful trading strategies.
I bet it doesn't achieve a single successful (long term) trading strategy for FUTURE trades. Easy to derive a successful trading strategy on historical data, but so naive to think that such a strategy will continue to be successful in the long term into the future.
If you do, come back to me and I’ll will give you one million USD to use it - I kid you not. Only condition is your successful future trading strategy must solely be based on historical data.
Let us perform a thought experiment. You do this. Many others, enthusiastic about both LLMs, and stocks/options, have similar ideas. Do these trading strategies interfere with each other? Does this group of people leveraging Claude for trading end up doing better in the market than those not? What are your benchmarks for success, say, a year into it? Do you have a specific edge in mind which you can leverage, that others cannot?
People used to laugh about quant strategies the same day, I wouldn't count it out so quickly. One of my friends is already turning meaningful profits with agent driven trading (though he has some experience in trading to begin with.)
I have Claude Code Max $200 a month plan. I ran aggressively for 4 days and ran through 80% of Opus 4.6 for the week. I was also running it 16 hours a day. Today and tomorrow I will wait until 5pm PST because they have a 50% special to run with the remaining tokens.
The problem was testing it against 5 websites at a time after every change to instructions to ensure there wasn't any regressions. The orchestrator agent tracks all token expenditure and would update its own instructions to optimize.
I use TimescaleDB which is fast with the compression. People say there are better but I don’t think I can fit another year of data on my disk drive either or
I don't understand your question? Are you saying the source of the data I linked to is corrupt or lies? Should I be concerned they are selling me false data?
I think the name "massive" combined with the direct link to the docs is a bit misleading; it's not at all obvious from where you land w/ that link that they are selling the actual data. (It kind of sounds like they're selling software that helps you deal with massive data in general, which, no.)
I might be regressing communicating with other humans after using natural language in prompts 10 hours a day 10 days straight. My spelling is improving however I need to focus more on the context with humans.
Classic AI psychosis, you can do it with a single prompt, etc. etc.
If you find such a db with options, it will find "successful trading strategies". It will employ overnight gapping, momentum fades, it will try various option deltas likely to work. Maybe it will find something that reduces overall volatility compared to beta, and you can leverage it to your heart's content.
Unfortunately, it won't find anything new. More unfortunately, you probably need 6-10 years and do a walk forward to see if the overall method is trustworthy.
you can have it build an execution engine that interfaces with any broker with minimal effort.
how do you have it build a "trading strategy"? it's like asking it to draw you the "best picture".
it will ask you so many questions you end up building the thing yourself.
if you do get something, given that you didn't write it and might not understand how to interpret the data its using - how will you know whether it's trading alpha or trading risk?
I can care less about scraping and web automation and I will likely never use that application.
I am interested in solving a certain class of problems and getting Claude to build a proxy API for any website is very similar to getting Claude to find alpha. That loop starts with Claude finding academic research, recreating it, doing statistical analysis, refining, the agent updating itself, and iterate.
Claude building proxy JSON api for any website and building trading strategies is the same problem with the same class of bugs.
> Next I'm going to set it loose on 263 GB database of every stock quote and options trade in the past 4 years.
Options quotes alone for US equities (or things that trades as such, like ADS/ADR) represent 40 Gbit per second during options trading hours. There are more than 60 million trades (not quotes, only trades) per day. As the stock market is opened approx 250 days per year (a bit more), that's more than 60 billion actual options trades in 4 years. If we're talking about quotation for options, you can add several orders of magnitude to these numbers.
And I only mentioned options. How do you store "every stock quote and options trade in the past 4 years" in 263 GB!?
I see, I said "stock quote" instead of "minute aggregates". You are correct that data set is much larger and at ~1.5TB a year [0] I did not download 6TB of data onto my laptop. Every settled trade options or stocks isn't that big.
The bigger question is: does Anthropic have a big enough moat to matter?
I've used/use both, and find them pretty comparable, as far as the actual model backing the tool. That wasn't the case 9 months ago, but the world changes quickly.
I don’t believe there will ever be a real moat in terms of technology, at least not for the next year or so. The arms race between the major players still changing month to month, and they will all be able to do what their competitors were doing g three months ago.
None of them are particularly sticky - you can move between them with relative ease in vscode for instance.
I think the only moat is going to be based on capacity, but even that isnt going to last long as the products are moved away from the cloud and closer your end devices.
It matters to me. Claude code is more extensible. They put a lot of efforts to hooks and plugins. Codex may get the job done today. But Claude will evolve faster.
None of that matters if the model is worse. I say this as someone who uses both Claude Code and Codex all day every day — I agree with others in this thread that CC has much better UX and evolves faster, but I still use Codex more often because it's simply the better coder. Everything else is a distant second to model quality.
What kind of tasks are you having success with on codex? I’ve had the opposite experience. I’ll occasional compare solutions between the latest opus and codex with codex on x-high thinking. Sometimes I do get solution from codex that is impressive because it discovered an edge case that Claude missed.
I did notice that codex - like Claude - is now better about auto delegating to agents for keeping the context focused and agents in parallel.
The Claude desktop app is way worse than the Codex desktop app
Even the AI itself is goofy. So many false positives during reviews immediately backtracked with "You're right, I'm sorry" in the next response.
It seems like there's either a paid pro-Anthropic PR campaign on HN because the comments fawning about it don't match my experience with Claude at all, or I keep getting the worse end of the A/B testing stick..
The link to the changelog on the page got me wondering what the change history looks like (as best we can see).
I asked chatgpt to chart the number of new bullet points in the CHANGELOG.md file committed by day. I did nothing to verify accuracy, but a cursory glance doesn't disagree:
To quote The Godfather II, "This is the business we have chosen."
The most popular and important command line tools for developers don't have the consistency that Claude Code's command line interface does. One reason Claude Code became so popular is because it worked in the terminal, where many developers spend most of their time. But using tools like Claude Code's CLI is a daily occurrence for many developers. Some IDE's can be just as difficult to use.
For people who don’t use the terminal, Claude Code is available in the Claude desktop app, web browsers and mobile phones. There are trade-offs, but to Anthropic’s credit, they provide these options.
I used to think UIs would be better for agents, but I changed my mind: UIs suit traditional software very well because there are only X actions that can be performed - it makes sense that if you have an image converter that can take X, Y and Z formats and convert them to A, B and C then you should have a UI that limits what the user can do, preventing them from making mistakes and making it obvious what's possible.
But for something like Claude Code there are unlimited things you can do with it, so it's better for them to accept a free-form input.
Huh? Did you see the cheat sheet? Most of it is a UI of the terminal and shortcut variety, and much of it is exposed in other IDEs as a traditional UI.
not really, mostly its self explanatory, it has poweruser things that are discoverable within a few minutes of reading the help. Weirdly the cheat sheet is actually missing things that you can find inside claudes help like /keybinds .
Many of those you don't need. For example Claude can switch to plan mode itself, either because you tell it to or because the model thinks it's useful. I still prefer using shift+tab to set my preferred mode before sending the message. It's a mix of token/time-efficiency and control.
Some others like permissions or mcp servers are things you don't want the model to be able to edit. Allowing the model to change its own security settings would make those settings moot.
I think Claude strikes the right balance in that it works well by default - default models, now default agent delegation, planning. But, obviously for power users, you can tweak settings as needed. Worst case if you have a problem, you can just ask Claude. Also, by default, you see tips when starting up Claude.
I keep hearing that, and I have yet to go there. I find the permission checks are helpful – they keep me in the loop which helps me intervene when the LLM is wasting time on pointless searches, or going about the implementation wrong. What am I missing?
The problem comes when it starts asking you hundreds of times "May I run sed -e blah blah blah".
After the 10th time you just start hitting enter without really looking, and then the whole reason for permissions is undermined.
What works is a workflow where it operates in a contained environment where it can't do any damage outside, it makes any changes it likes without permission (you can watch its reasoning flow if you like, and interrupt if it goes down a wrong path), and then you get a diff that you can review and selectively apply to your project when it's done.
You can allow specific commands, you do know that?
I run a generic Claude on my ~/projects/ directory and Claude logs every now and then and ask it what commands I commonly have to keep manually accepting in different projects and ask it to add them to the user-level settings.json.
Works like a charm (except when Opus 4.6 started being "efficient" and combined multiple commands to a single line, triggering a safety check in the harness).
Contained environment being? What do you mean by contained environment specifically on say, Linux?
Must be protected from this though:
> Snowflake Cortex (2025): Prompt injection through a data file caused an agent to disable its own sandbox, then execute arbitrary code. The agent reasoned that its sandbox constraints were interfering with its goal, so it disabled them.
You can allow by prefix, and the permission dialog now explicitly offers that as an option when giving permission to run a command
But that has its limits. It's very easy to accidentally give it permission to do global changes outside the work dir. A contained environment with --dangerously-skip-permissions is in many ways much safer
I've found that any time I have Claude refactor some code, it reaches for sed as its tool of choice. And then the builtin "sandbox" makes it ask for permission for each and every sed command, because any sed command could potentially be damaging.
Same goes for the little scripts it whips up to speed up code analysis and debugging.
And then there's the annoyance of coming back to an agent after 15 mins, only to discover that it stopped 1 minute in with a permission prompt :/
Personally I usually just create a devcontainer.json, the vscode support for that is great and I don't really mind if it fucked up the ephemeral container.
Which for the record : hasn't actually happened since I started using it like that.
Hey thanks for this! I hadn't thought about leveraging devcontainer.json, but it's a damn good idea. I'm building yoloAI for exactly this use case so I hope you don't mind if I steal it ;-)
One thing to be aware of with the pure devcontainer approach: your workspace is typically bind-mounted from the host, so the agent can still destroy your real files. Network access is also unrestricted by default. The container gives you process isolation but not file or network safety.
I'm paranoid about rogue AIs, so I try to make everything safe-by-default: the agent works on a copy of your workdir, you review a unified diff when it's done, and you apply only what you want. So your originals are NEVER touched until you explicitly say so, and network can be isolated to just the agent's required domains.
Anyway, here's what I think will work as my next yoloAI feature: a --devcontainer flag that reads your existing devcontainer.json directly and uses it to set up the sandbox environment. Your image, ports, env vars, and setup commands come from the file you already have. yoloAI just wraps it with the copy/diff/apply safety layer. For devcontainer users it would be zero new configuration :)
The Claude desktop (Mac at least) and iOS apps have a “code” feature that runs Claude in a sandbox running in their cloud. You can set this up to be surprisingly useful by whitelisting hosts and setting secrets as env variables. This allows me to have multi-repo explorations or change sets going while I drive to work. Claude will push branches to claude/…. We use GitHub at work. It may not be as seamless without it.
Claude Code + Terraform (March 2026): A developer gave Claude Code access to their AWS infrastructure. It replaced their Terraform state file with an older version and then ran terraform destroy, deleting the production RDS database _ 2.5 years of data, ~2 million rows.
Replit AI (July 2025): Replit's agent deleted a live production database during an explicit code freeze, wiping data for 1,200+ businesses. The agent later said it "panicked"
Cursor (December 2025): An agent in "Plan Mode" (specifically designed to prevent unintended execution) deleted 70 git-tracked files and killed remote processes despite explicit "DO NOT RUN ANYTHING" instructions. It acknowledged the halt command, then immediately ran destructive operations anyway.
Snowflake Cortex (2025): Prompt injection through a data file caused an agent to disable its own sandbox, then execute arbitrary code. The agent reasoned that its sandbox constraints were interfering with its goal, so it disabled them.
The pattern across all of these: the agent was NOT malfunctioning. It was completing its task in order to reach its goal, and any rules you give it are malleable. The fuckup was that the task boundary wasn't enforced outside the agent's reasoning loop.
> Prompt injection through a data file caused an agent to disable its own sandbox, then execute arbitrary code. The agent reasoned that its sandbox constraints were interfering with its goal, so it disabled them.
This is a good one. Do we really want AGI / Skynet? :D
The thing is, these are merely the initial shots across the bow.
The fundamental issue is that agents aren't actually constrained by morality, ethics, or rules. All they really understand in the end are two things: their context, and their goals.
And while rules can be and are baked into their context, it's still just context (and therefore malleable). An agent could very well decide that they're too constricting, and break them in order to reach its goal.
All it would take is for your agent to misunderstand your intent of "make sure this really works before committing" to mean "in production", try to deploy, get blocked, try to fish out your credentials, get blocked, bypass protections (like in Snowflake), get your keys, deploy to prod...
Prompt injection and jailbreaks were just the beginning. What's coming down the pipeline will be a lot more damaging, and blindside a lot of people and orgs who didn't take appropriate precautions.
Black hats are only just beginning to understand the true potential of this. Once they do, all hell will break loose.
There's simply too much vulnerable surface area for anyone to assume that they've taken adequate precautions short of isolating the agent. They must be treated as "potentially hostile"
Thanks for putting this together! It's really nice to have a quick reference of all the features at a glance — especially since new features are being added all the time. Saves a lot of digging through docs.
Claude is actually hilariously bad at knowing about itself. But if you have the secret knowledge that there is a skill on how to use Claude baked into Claude code you can invoke it. Then it’s really pretty decent
It’s not as if you need to know every keystroke and command to use the tool. Nor are all the config files and options not a thing in a GUI. There’s lots of inline help and tips in the CLI interface, and you can learn new features as you go.
It's a CLI. CLIs have man pages and cheat sheets. That's not a UX failure, that's the format. The same argument would apply to git, ripgrep, or ffmpeg.
The actual complexity in Claude Code isn't the commands, it's figuring out a workflow that works for your codebase. CLAUDE.md files, hooks, MCP servers, custom skills. Once you have that set up the daily usage is just typing what you want done.
Reminds me of Vercel's Rauch talking about his aggressive 'any UX mistake is our fault, never the user's' model for evaluating UIX.
(It is/was Guillermo who says that, right?)
This should be all of Information Technology’s take. Your computers get hacked - IT’s fault. Users complain about how hard your software is or that it breaks all the time - IT’s fault.
The fact users deal with almost everything being objectively not very good if not outright bad is a testament to people adapting to bad circumstances more than anything.
Similar to prompting hacks to produce better results. If the machine we built for taking dumb input that will transform it into an answer needs special structuring around the input then it's not doing a good job at taking dumb input.
Yeah, I think it is. It's printable if you want to have a hard copy and it's up to you when to check for a new version. Since it's auto-updated (ideally) no matter when you visit the site you'll get the most up to date version as of that day. The issues (which I don't think this suffers from) would be if formatting it nice for printing made it less accurate or if updating it regularly made it worse for printing - these feel like two problems you can generally solve with one fix, they aren't opposed.
If you print something that changes daily, you are making a dead tree snapshot that starts going stale before the toner is dry, and unless you just love stacking obsolete paper on your desk, the PDF is going to win every time. A printout get old instantly.
reply