Skip to main content

Comparing LLMs for Coding

· 5 min read
Sanjeev Sarda
High Performance Developer

I decided to interview 3 developers - one was a Llama, one was a Wizard and the other was just wind.

A stabley diffused llama

Tell me about a time you've used an LLM to write a...

So, I've been playing around with LLMs recently, probably nothing new to many of you, but I decided to take a systematic approach at evaluating a few models to see how I could integrate them into my developer tooling.

After setting up an Ollama server, OpenWebGUI and Complete (in VSCode) I got down too it. My user story as a developer when it comes to LLMs is that I want them to help me do things faster and more efficiently. Secretly, I wanted them to do all the things which perhaps I don't like doing like attaining 100% unit test coverage, writing documentation and making the coffee.

But what would I do with all that free time? I could only daydream...

Daydream I fell asleep amongst the flowers for a couple of hours

What would someone who's recently been working on their React and JS/TS skills ask an LLM to do? To write a signup form obviously!

First shot at writing a signup form and unit tests


// SignupForm.tsx
import React, {useState} from 'react';
import {Grid, Typography, TextField, Button} from '@material-ui/core';

interface SignupFormData {
name: string;
email: string;
password: string;
confirmPassword: string;
}

const SignupForm: React.FC = () => {
const [formData, setFormData] = useState<SignupFormData>({
name: '',
email: '',
password: '',
confirmPassword: '',
});

const handleSubmit = (event: React.FormEvent<HTMLFormElement>) => {
event.preventDefault();
// TO DO: send the form data to your API or backend
};

return (
<Grid container spacing={2}>
<Grid item xs={12} sm={6}>
<Typography variant="h4">Sign up for an account</Typography>
<form onSubmit={handleSubmit}>
<TextField
label="Name"
value={formData.name}
onChange={(event) =>
setFormData({ ...formData, name: event.target.value })
}
fullWidth
/>
<TextField
label="Email"
value={formData.email}
onChange={(event) =>
setFormData({ ...formData, email: event.target.value })
}
type="email"
fullWidth
/>
<TextField
label="Password"
value={formData.password}
onChange={(event) =>
setFormData({ ...formData, password: event.target.value })
}
type="password"
fullWidth
/>
<TextField
label="Confirm Password"
value={formData.confirmPassword}
onChange={(event) =>
setFormData({ ...formData, confirmPassword: event.target.value })
}
type="password"
fullWidth
/>
<Button type="submit" variant="contained" color="primary">
Sign up
</Button>
</form>
</Grid>
</Grid>
);
};

export default SignupForm;

So, after some cajoling (those of us in the know call it incremental prompting), I was able to make both Wizard and Mistral produce pretty functional looking code and unit tests for a signup form.

Changing Specifications

Changing the specifications of what you need also worked out quite nicely (unlike with your typical freelancer or consultant), though deep down I just knew all 3 LLMs were swearing at me behind my back:

I asked all 3 to change the way the password was validated.

const passwordRegex = /^[a-zA-Z0-9!@#$%^&*()_\-={ }\[\]:;?~`+<>,./]+$|^[a-zA-Z]{8,}$/;

<TextField
label="Confirm Password"
value={formData.confirmPassword}
onChange={(event) =>
setFormData({ ...formData, confirmPassword: event.target.value })
}
type="password"
fullWidth
/>

Unit test style

Let's take a look at the unit test style:


describe('SignUpForm', () => {
beforeEach(() => {
cy.visit('/sign-up'); // visit the sign up page
});

it('should render the form correctly', () => {
cy.get('[data-test="username-field"]').should('be.visible');
cy.get('[data-test="password-field"]').should('be.visible');
cy.get('[data-test="confirm-password-field"]').should('be.visible');
cy.get('[data-test="sign-up-button"]').should('be.visible');
});

it('should prevent form submission if passwords do not match', () => {
cy.fillForm({
username: 'test-user',
password: 'password123',
confirmPassword: 'wrong-password'
});
cy.get('[data-test="sign-up-button"]').click();
cy.get('.MuiSnackbar-root')
.should('be.visible')
.contains('Passwords do not match');
});

});

What did you find out?

Overall, the best attempt and easiest to work with was by Llama3 (8B), which I was eventually able to prompt into writing code with a data-cy attribute for testing with Cypress. Sometimes the poor LLM seems to get tired and needs to be enticed into writing code (Wizard coder is particularly lazy), but you can get fairly decent looking code within a few attempts.

It's also possible to convert incremental prompting to one-shot, but the placement and existence of a single word can produce very different looking code.

Rightly or wrongly, I found myself deciding things based on style and how responsive I felt the different models were.

Stay tuned for the next part in the series where we see who else's job LLMs will replace!

They took our jobs