Devin is an AI that was used on the SWE-bench, a challenging benchmark that asks agents to resolve real-world GitHub issues found in open source projects like Django and scikit-learn. Devin correctly resolves 13.86% of the issues end-to-end, far exceeding the previous state-of-the-art of 1.96%. Even when given the exact files to edit, the best previous models can only resolve 4.80% of issues.
Microsoft copilot has also been used to improve programmer productivity. Senior developers in silicon valley have called Microsoft coPilot scary good. They believe that IT projects could soon be performed with an IT staff 10% of the size of pre-AI team sizes.
There is a list of 30 business productivity use cases for copilot.
Global Software and IT Services are a $2.3 trillion industry.
There are also smaller use cases of customer service, call center support and paralegals.
Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.
Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.
A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.
>Devin correctly resolves 13.86% of the issues end-to-end
So use it to help make Devin-2 which can resolve 14% of issues
Use Devin-2 to help make Devin-3 which can resolve 15% of issues
….
Soon Devin-X will far exceed any and all humans in resolving issues
I believe programming is the killer niche for the LLM. I’ve wanted to make a fuel shuffling/management GUI for years, but never had the patience to spend the time learning how to program radio buttons and drop downs, displays, etc.. Programmers by trade might scoff at such a low barrier (making a basic mouse driven GUI) being prohibitive… Instead, I do the work rather manually with scripts operating on text files and pasting arrays between excel and ascii as needed. I’ve spent a lot of time developing the skills of the trade and could write a specification for the interface I desire as a user. CHAT-GPT does reduce the height of the barrier quite a bit; a customer can describe the desired product conversationally. Of course the customer would need to be proficient enough to debug and adapt the imperfect code given by the “AI”… I figure lots of professions have similar needs…
Think of teams of LMMs that are trained for specific tasks like drop in replacements for most jobs that can be done on screens. They are disembodied robots whose general skills are language and understanding images on screens. Cloud based LMM team robots will start replacing jobs before physical humanoid robots do.