Devin AI Failure on One of Its Upwork Projects

Youtuber, Internet of bugs, broke down the Devin Upwork video frame by frame. He shows what Devin was supposed to do, what it actually managed to do instead, and how bad a job of that it did.

Devin was not given the right instructions by the Cognition Employee. It was supposed to write instructions for setting up an outdated repository of code onto the Amazon cloud (AWS). It was told to set up the repository. It then created buggy code and then it corrected its own errors. The errors were created by the Devin AI and those errors were not in the code or libraries it was working with.

The readme file of the files had a one-line instruction that could do the setup that Devin ended up working on.

Devin could not interpret the files or what was asked. The humans working at Cognition picked the wrong problem for Devin to work with and did not transfer the starting prompt correctly.

My Analysis of What Happened

Devin seems like a system that can perform certain coding projects. It failed at this tasks because of the human employee had problems picking the ruck project to work upon and failing to understand the requirements. There needs to be improvements on verifying and understanding requirements. There seem to be errors which the Devin system is creating for itself.

There is actual usefulness from the Microsoft CoPilot AI to help write code and solve problems. However, knowledgable people have to verify the results. The systems require supervision.

There are some that claim Devin is a glorified API wrapper.

There is a lot more testing and verification of claims made by AI companies.

The Details of this Problem Project

The human professional who checked Devin’s work said:
1) Cognition lied about what Devin could do in the video description, and
2) a lot of people uncritically parroted the lie all over the Internet, and
3) That caused a lot of non-technical people to believe that AI might replace programmers soon.

00:00 Intro
00:30 The claim and the problem


03:47 What the job actually would have required
05:50 Requirements that needed to be determined

07:32 How a human compensates for Upwork’s lack of RFP process
10:11 What Devin did instead, and how poorly
13:03 Devin seems to be fixing code from Github
14:19 But Devin is actually making up errors, and then fixing them
15:40 All Devin had to do was run the command from the README
16:23 But Devin couldn’t figure that out
16:26 So it created this nightmare ‘C’-style low-level buffer append loop in Python
17:25 My replication of what Devin tried to do
18:15 It took me about 36 minutes
20:16 It took Devin at least six hours, and maybe more than a day
20:48 more bad Devin code
22:08 List of useless things that make Devin look competent
23:48 Conclusion, and a Plea

The actual client who posted the Upwork project reports that what was done was not what he requested.

Here is the Cognition video.

Details of the debunking engineer doing the actual requested work.

Devin Has Interesting And Useful Capabilities

Devin has successful demos showing interesting work.
However, the claims and progress are exaggerated.
The chart comparing it to other AI is also not fair.

2 thoughts on “Devin AI Failure on One of Its Upwork Projects”

  1. It’s rubbish at the moment, but knowing AI field, it will probably be 10-100x better when they release GPT-5 or other next gen big model with coding capabilities.

Comments are closed.