Key Takeaways
Programming tools that use Large Language Models (LLMs) to generate and edit code have attracted the attention of major AI players.
But on Thursday, March 14, AI startup Cognition showcased Devin – an “AI software engineer” that it claims is far more powerful than existing LLM-powered coding assistants.
Cognition CEO Scott Wu prompted Devin to benchmark the performance of the open-source LLM Llama-2 using three different API providers.
The AI platform started by making a step-by-step plan for the required program. After that “it builds the whole project using the same tools that a human software engineer would” Wu said.
Further product demos show Devin helping build websites, fix bugs and set up LLM fine-tuning. Cognition even said it has completed real jobs posted on the freelancing platform Upwork.
The new model also outperformed OpenAI’s GPT-4, Claude and other popular LLMs when evaluated using SWE-bench. This is a benchmark that challenges AI agents to resolve real-world GitHub issues.
However, with the AI software developer still unreleased, many are skeptical of Cognition’s claims.
Commenters online pointed out that the preview Cognition has shared publicly looks nothing like the interface demonstrated in its videos.
Comment
byu/CommunismDoesntWork from discussion
incscareerquestions
Many also observed the demonstrations only show Devin replicating widely documented and narrow tasks in an unrealistic environment. Indeed, some Redditors even speculated the project could be a scam.
Ultimately, whether Devin lives up to the hype or not, automatic programming tools could transform the software development process. In some ways, they already have.
Cognition markets Devin as an end-to-end solution that reduces the software development process to a matter of AI prompts.
Although the technology is still some way off from realizing that vision, AI code generators have already changed the way some software engineers work.
Custom LLMs like GitHub’s GPT-powered code completion tool Copilot aim to boost productivity and are marketed as suitable for menial but time-consuming tasks like mapping data fields.
Devin promises to go one step further, pointing to a future in which someone with near-zero programming experience can build applications with AI.
Just like video generators have sparked fears that professional filmmakers could be replaced by AI, software developers are concerned about the implications of automatic code generators.
Looking to the future, these tools could boost productivity in many sectors. However, they will still need people who understand the intricacies of film, software, design or any other specialist field to generate prompts, guide development and evaluate results. Indeed, if bosses will need to make sure they do not make experienced and talented creators redundant, only to find AI cannot truly replicate human endeavour.