Categories: SEO

New Open Source ChatGPT Clone

Open Source GPT Chat took another step forward with the release of the Dolly Large Language Model (DLL) created by the Databricks enterprise software company.

The new ChatGPT clone is called Dolly, named after the famous sheep of that name, the first mammal to be cloned.

Open Source Large Language Models

The Dolly LLM is the latest manifestation of the growing open source AI movement that seeks to offer greater access to the technology so that it’s not monopolized and controlled by large corporations.

One of the concerns driving the open source AI movement is that businesses may be reluctant to hand over sensitive data to a third party that controls the AI technology.

Based on Open Source

Dolly was created from an open source model created by the non-profit EleutherAI research institute and the Stanford University Alpaca model which itself that was created from the 65 billion parameter open source LLaMA model created by Meta.

LLaMA, which stands for Large Language Model Meta AI, is a language model that is trained on publicly available data.

According to an article by Weights & Biases, LLaMA can outperform many of the top language models (OpenAI GPT-3, Gopher by Deep Mind and Chinchilla by DeepMind) despite being smaller.

Creating a Better Dataset

Another inspiration came from an academic research paper (SELF-INSTRUCT: Aligning Language Model with Self Generated Instructions PDF) that outlined a way to create a high quality autogenerated question and answer training data that is better than the limited public data.

The Self-Instruct research paper explains:

“…we curate a set of expert-written instructions for novel tasks, and show through human evaluation that tuning GPT3 with SELF-INSTRUCT outperforms using existing public instruction datasets by a large margin, leaving only a 5% absolute gap behind InstructGPT…

…Applying our method to vanilla GPT3, we demonstrate a 33% absolute improvement over the original model on SUPERNATURALINSTRUCTIONS, on par with the performance of InstructGPT… which is trained with private user data and human annotations.”

The importance of Dolly is that it demonstrates that a useful large language model can be created with a smaller but high quality dataset.

Databricks observes:

“Dolly works by taking an existing open source 6 billion parameter model from EleutherAI and modifying it ever so slightly to elicit instruction following capabilities such as brainstorming and text generation not present in the original model, using data from Alpaca.

…We show that anyone can take a dated off-the-shelf open source large language model (LLM) and give it magical ChatGPT-like instruction following ability by training it in 30 minutes on one machine, using high-quality training data.

Surprisingly, instruction-following does not seem to require the latest or largest models: our model is only 6 billion parameters, compared to 175 billion for GPT-3.”

Databricks Open Source AI

Dolly is said to democratize AI. It’s a part of a gowning movement that was recently joined by the non-profit Mozilla organization with the founding of Mozilla.ai. Mozilla is the publisher of the Firefox browser and other open source software.

Read the full announcement by Databricks:

Hello Dolly: Democratizing the magic of ChatGPT with open models

FOLLOW US ON GOOGLE NEWS

 

Read original article here

Denial of responsibility! Search Engine Codex is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@searchenginecodex.com. The content will be deleted within 24 hours.

Share
Chris Barnhart

Leave a Comment
Published by
Chris Barnhart

Recent Posts

Daily Search Forum Recap: May 3, 2024

Here is a recap of what happened in the search forums today, through the eyes…

May 4, 2024

The Best Times to Post on Instagram for Maximum Engagement: Unlocking Secrets

Navigating the world of Instagram posting involves many considerations, but one crucial aspect is timing…

May 4, 2024

The Industry Mourns The Loss Of Mark Irvine

I am deeply sad to report that Mark Irvine passed away unexpectedly last night. Mark…

May 3, 2024

Google AdSense Removed Privacy Policy As Place To Withdraw Consent

Google AdSense has removed reference to your privacy policy as a place to withdraw consent.…

May 3, 2024

Google SGE AI Answers Cost 80% Less To Generate Now

One of the big worries for Google investors was the cost of running AI to…

May 3, 2024

Google March Core Update Done, HCU Recoveries, Site Reputation Abuse & AI Topics

This week, we covered how the Google March 2024 core update finished back on April…

May 3, 2024