Comparing LLMs for Coding
I decided to interview 3 developers - one was a Llama, one was a Wizard and the other was just wind.
Tell me about a time you've used an LLM to write a...
So, I've been playing around with LLMs recently, probably nothing new to many of you, but I decided to take a systematic approach at evaluating a few models to see how I could integrate them into my developer tooling.
After setting up an Ollama server, OpenWebGUI and Complete (in VSCode) I got down too it. My user story as a developer when it comes to LLMs is that I want them to help me do things faster and more efficiently. Secretly, I wanted them to do all the things which perhaps I don't like doing like attaining 100% unit test coverage, writing documentation and making the coffee.
But what would I do with all that free time? I could only daydream...
What would someone who's recently been working on their React and JS/TS skills ask an LLM to do? To write a signup form obviously!
First shot at writing a signup form and unit tests
- Llama3
- Wizard
- Mistral
// SignupForm.tsx
import React, {useState} from 'react';
import {Grid, Typography, TextField, Button} from '@material-ui/core';
interface SignupFormData {
name: string;
email: string;
password: string;
confirmPassword: string;
}
const SignupForm: React.FC = () => {
const [formData, setFormData] = useState<SignupFormData>({
name: '',
email: '',
password: '',
confirmPassword: '',
});
const handleSubmit = (event: React.FormEvent<HTMLFormElement>) => {
event.preventDefault();
// TO DO: send the form data to your API or backend
};
return (
<Grid container spacing={2}>
<Grid item xs={12} sm={6}>
<Typography variant="h4">Sign up for an account</Typography>
<form onSubmit={handleSubmit}>
<TextField
label="Name"
value={formData.name}
onChange={(event) =>
setFormData({ ...formData, name: event.target.value })
}
fullWidth
/>
<TextField
label="Email"
value={formData.email}
onChange={(event) =>
setFormData({ ...formData, email: event.target.value })
}
type="email"
fullWidth
/>
<TextField
label="Password"
value={formData.password}
onChange={(event) =>
setFormData({ ...formData, password: event.target.value })
}
type="password"
fullWidth
/>
<TextField
label="Confirm Password"
value={formData.confirmPassword}
onChange={(event) =>
setFormData({ ...formData, confirmPassword: event.target.value })
}
type="password"
fullWidth
/>
<Button type="submit" variant="contained" color="primary">
Sign up
</Button>
</form>
</Grid>
</Grid>
);
};
export default SignupForm;
To create a signup form using React and Material UI, follow these
steps:
Install the necessary dependencies by running npm install
@material-ui/core.
Create a new component named SignupForm.tsx in a separate file.
Import Material UI components such as Button, TextField,
FormControl, and FormLabel from '@material-ui/core'.
"I cannot write the entire code for you in this response, but I can
provide you with an outline of how to create a signup form using
TypeScript, React, and Material-UI, along with some guidance on
writing tests with Cypress. Firstly, let's create the SignupForm component:"
So, after some cajoling (those of us in the know call it incremental prompting), I was able to make both Wizard and Mistral produce pretty functional looking code and unit tests for a signup form.
Changing Specifications
Changing the specifications of what you need also worked out quite nicely (unlike with your typical freelancer or consultant), though deep down I just knew all 3 LLMs were swearing at me behind my back:
I asked all 3 to change the way the password was validated.
- Llama3
- Wizard
- Mistral
const passwordRegex = /^[a-zA-Z0-9!@#$%^&*()_\-={ }\[\]:;?~`+<>,./]+$|^[a-zA-Z]{8,}$/;
<TextField
label="Confirm Password"
value={formData.confirmPassword}
onChange={(event) =>
setFormData({ ...formData, confirmPassword: event.target.value })
}
type="password"
fullWidth
/>
const handleSubmit = (event: React.FormEvent<HTMLFormElement>) => {
event.preventDefault();
const {password} = formData;
if (password.length < 8 || !/.*[!@#$%^&*()_+\-=\[\]{ };':"\\|,.<>\/?].*/.test(password)) {
alert('Password must be over 8 characters and contain a special character');
return;
}
onSubmit(formData);
};
<TextField
label="Password"
name="password"
value={values.password}
onChange={handleChange}
type="password"
error={!isValidPassword(values.password)} // Set error state based on validation result
helperText={isValidPassword(values.password) ? '' : 'Password must be over 8 characters long and contain at least one special character'}
/>
const isValidPassword = (password: string) => {
const regex = /^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/; // Password validation regex
return regex.test(password);
};
Unit test style
Let's take a look at the unit test style:
- Llama3
- Wizard
- Mistral
describe('SignUpForm', () => {
beforeEach(() => {
cy.visit('/sign-up'); // visit the sign up page
});
it('should render the form correctly', () => {
cy.get('[data-test="username-field"]').should('be.visible');
cy.get('[data-test="password-field"]').should('be.visible');
cy.get('[data-test="confirm-password-field"]').should('be.visible');
cy.get('[data-test="sign-up-button"]').should('be.visible');
});
it('should prevent form submission if passwords do not match', () => {
cy.fillForm({
username: 'test-user',
password: 'password123',
confirmPassword: 'wrong-password'
});
cy.get('[data-test="sign-up-button"]').click();
cy.get('.MuiSnackbar-root')
.should('be.visible')
.contains('Passwords do not match');
});
});
cy.get('#first-name').type('John');
cy.get('#last-name').type('Doe');
cy.get('#email').type('john.doe@example.com');
cy.get('#password').type('Password123!');
cy.get('form').submit();
expect(onSubmitSpy).to.have.been.calledWith({
firstName: 'John',
lastName: 'Doe',
email: 'john.doe@example.com',
password: 'Password123!'
});
it('should validate email and password fields', () => {
cy.mount(<SignupForm />);
cy.get('#email').type('test@email.com{enter}'); // enters email and triggers blur event
cy.get('#email').should('have.class', 'Mui-error'); // check for error class
What did you find out?
Overall, the best attempt and easiest to work with was by Llama3 (8B), which I was eventually able to prompt into writing code with a data-cy attribute for testing with Cypress. Sometimes the poor LLM seems to get tired and needs to be enticed into writing code (Wizard coder is particularly lazy), but you can get fairly decent looking code within a few attempts.
It's also possible to convert incremental prompting to one-shot, but the placement and existence of a single word can produce very different looking code.
Rightly or wrongly, I found myself deciding things based on style and how responsive I felt the different models were.
Stay tuned for the next part in the series where we see who else's job LLMs will replace!