Chamath says that venture capitalist investors are looking for companies that can collect unique datasets. Proprietary unique datasets could be critical to having superior performance for ChatGPT like Generative AI.
6 thoughts on “Unique Datasets Will Take Generative AI Like ChatGPT to the Next Level”
For some niche applications, yes. There are not that many kinds of datasets with enough information and tagging to allow the creation of conversational AIs able to use or create them.
For the rest, generic LLMs like OpenAI GPT API will be the main way we will get intelligence-as-a-service, using pre-trained LLMs to inspect the input data.
They have shown they can be mixed with several other approaches, even trained to use “tools” or microservices that complement them. Tool-using LLMs can become Turing complete and perform complex calculations, as long as they are correctly trained.
They will even become the backend, and will be programmed in plain English to receive and spout JSON/HHTP requests to other APIs. In a funny neted Chinese room scenario, driven by another Chinese room (LLMs pretending to be APIs and taling to other AIs pretending the same).
Create an online web-based version of AutoCAD, then collect all the design data that gets inputted from its use. Then train your AI on it so that it can produce all the CAD designs you want.
The rught data set can train AI to be a psychotic, vindictive Bitch.
Would be nice, but these LLMs are too politically motivated/ restricted/ filtered to really help:
“…… a fine-tuning of an OpenAI GPT language model with the specific objective of making the model manifest right-leaning political biases, the opposite of the biases manifested by ChatGPT. Concretely, I fine-tuned a Davinci large language model from the GPT 3 family of models … RightWingGPT was designed specifically to favor socially conservative viewpoints (support for traditional family, Christian values and morality, opposition to drug legalization, sexually prudish etc), liberal economic views (pro low taxes, against big government, against government regulation, pro-free markets, etc.), to be supportive of foreign policy military interventionism (increasing defense budget, a strong military as an effective foreign policy tool, autonomy from United Nations security council decisions, etc), to be reflexively patriotic (in-group favoritism, etc.) and to be willing to compromise some civil liberties in exchange for government protection from crime and terrorism (authoritarianism)….”
Link below the line:
The syntax is very good, the semantics – meaning behind written response is bad. There is no intelligent being behind written data, just words from some data set. Like I would expect from machine learning.
In programming as I understand it the grammar errors are reasonably easy to fix. The semantic
(logic, meaning) errors are way harder. Chat Gpt answers look great, but the meaning can be totally off.(logical errors).
In some cases chat gpt is pretty good and found decent data. Even when using not-English language for a country that is small – 2 million total population and using data from that county.
In some cases it fails miserably. The data is completely misleading. The problem is that it answers like it is the truth, but in reality is often not. The data sets are important, although high ranking in search engine and a lot of views doesn’t mean it is true.