Youtuber, Internet of bugs, broke down the Devin Upwork video frame by frame. He shows what Devin was supposed to do, what it actually managed to do instead, and how bad a job of that it did.

Devin was not given the right instructions by the Cognition Employee. It was supposed to write instructions for setting up an outdated repository of code onto the Amazon cloud (AWS). It was told to set up the repository. It then created buggy code and then it corrected its own errors. The errors were created by the Devin AI and those errors were not in the code or libraries it was working with.

The readme file of the files had a one-line instruction that could do the setup that Devin ended up working on.

Devin could not interpret the files or what was asked. The humans working at Cognition picked the wrong problem for Devin to work with and did not transfer the starting prompt correctly.

My Analysis of What Happened

Devin seems like a system that can perform certain coding projects. It failed at this tasks because of the human employee had problems picking the ruck project to work upon and failing to understand the requirements. There needs to be improvements on verifying and understanding requirements. There seem to be errors which the Devin system is creating for itself.

There is actual usefulness from the Microsoft CoPilot AI to help write code and solve problems. However, knowledgable people have to verify the results. The systems require supervision.

The Details of this Problem Project

The human professional who checked Devin’s work said:

1) Cognition lied about what Devin could do in the video description, and

2) a lot of people uncritically parroted the lie all over the Internet, and

3) That caused a lot of non-technical people to believe that AI might replace programmers soon.

00:00 Intro

00:30 The claim and the problem



03:47 What the job actually would have required

05:50 Requirements that needed to be determined

07:32 How a human compensates for Upwork’s lack of RFP process

10:11 What Devin did instead, and how poorly

13:03 Devin seems to be fixing code from Github

14:19 But Devin is actually making up errors, and then fixing them

15:40 All Devin had to do was run the command from the README

16:23 But Devin couldn’t figure that out

16:26 So it created this nightmare ‘C’-style low-level buffer append loop in Python

17:25 My replication of what Devin tried to do

18:15 It took me about 36 minutes

20:16 It took Devin at least six hours, and maybe more than a day

20:48 more bad Devin code

22:08 List of useless things that make Devin look competent

23:48 Conclusion, and a Plea

The actual client who posted the Upwork project reports that what was done was not what he requested.



Here is the Cognition video.



Details of the debunking engineer doing the actual requested work.



Devin Has Interesting And Useful Capabilities

Devin has successful demos showing interesting work.

However, the claims and progress are exaggerated.

The chart comparing it to other AI is also not fair.