Skip to main content

Dangers of Agentic LLM Development

· 4 min read
Sanjeev Sarda
High Performance Developer

What are some of the dangers of developing systems using agentic LLMs? ~ ~ ~

Agentic LLMs

Resistance to change is natural. Technology changes and we have to adapt. When I first tested using LLMs for development, I wondered if this is how people felt when the first IDEs came out?

alt text

Hrmm, what linter should I use for these punch cards?

I am personally still hesitant to use an agentic flow too much in a language I want to learn but am not entirely fluent in.

IMHO, it's better to take the time to understand the idiomatic way to do things in a language, framework or library first than depend on LLM generated code to teach you that.

Pros

Alignment with Best Practices

The agentic LLMS approach promotes some good practices like small incremental changes, documentation and testing.

Learning and Development

It can also teach you the correct or better way of doing something in a language which may not be your main one.

Then it takes discipline to learn from what the LLM has done for you.

Cons

May Still Require Expert Skill

Sometimes the places where it fails to do what’s expected, you need relatively expert level skills to achieve it anyway, or at least a degree of experience that may not jive with what you think you have from using only agentic LLMs.

Prompt Fatigue and Context Switch

Some places it’s faster to make the change you want to the PR yourself than make the change through incremental prompting - you may end up context switching too much mentally.

In reality, you'll probably just end up doing everything through incremental prompting even though it's faster to make the change manually.

Review Fatigue

As your confidence with the output of LLMs increases, you'll probably start trying to use it to make larger changes in a one-shot style. This will inevitably lead to review fatigue - the size of the changeset produced becomes harder to review in a short amount of time, so you just accept the PR because (hopefully at least) you're testing the functional output.

Reduced Skill Development

LLMs can prevent people from learning a language properly, give a false sense of bravado (is it like googling/stack overflowing it on steroids?) - removal of the tool means you can’t produce.

This is exacerbated in an agentic style compared to the more autocomplete style llm tools.

Subtlty and Long Term Memory

You prevent the code going into your long term memory - you lose more and more appreciation of any subtleties between your code and the domain.

alt text

The death of subtlty

It may not increase your maximum or average rate of productivity in a meaningful way if your intent is not just to produce product, as you end up needing to take the time to review and understand the code change properly.

You risk "switching off" when you go too fast.

Gedanken - Disposable Code

Is code being built for reuse, maintainability or for disposal?

Disposable Only Code

You only care about the functionality and therefore you focus more on validating functional behavior, the role of functional testing increases but that also can be decreased with tools like Skyvern which deploy LLMs alongside computer vision techniques to automate UI testing.

Your focus in a disposable code environment is more architectural and functional.

In a reusable code environment your focus is architectural, functional and code quality (extensibility, reusability, maintainability etc).

Testing and Disposable Code

Does disposable only code lead to a greater focus on reusable unit, functional and performance tests?

As long as the neural net generates code that passes those tests, you don't care what the code is or looks like.

Do we only care about the what, or do we also care about the how?

Would we do this in say the pharma industry? We sure would - as an example check out this briefing from Imperial College, London.

Criticality vs Change Velocity

alt text

Process Development

Create project templates, process templates so you implement functionality in a consistent manner which is easier for an agentic LLM to replicate, easier for you to review and validate.

Write enough of the functionality and establish style to make it easier to prevent “llm fog”.

Perhaps also easier to tune the underlying LLM to your desired style using something like REFT if you have your own templates or samples.