As mentioned in an earlier post, attempting open-source projects in data science, and adding them to your portfolio is an excellent way to improve how competitive you are as an applicant for roles as a data scientist. One thrilling topic in data science is NLP (Natural Language Processing). An open-source project in NLP would be a wonderful addition to a resume and portfolio.
This marks the third time that Open AI has done it. After releasing GPT-2 the year before and causing a media frenzy around the world, they have produced open-source software for their latest release, NLP software – GPT-3!
To put it simply, GPT-3 is the biggest NLP (Natural Language Processing) model of it’s kind. It contains over 175 billion parameters (yes, you read that correctly), which is MASSIVE (almost 350GB). GPT-3 is close to being one of the most costly models in algorithm history (it required almost $12 million just to train it).
It’s not a secret that language models need a ton of training data for tasks that humans can learn with seconds. Step up – GPT-3. In the official paper that explains the way that GPT-3 performs tasks, OpenAI discusses how making use of scaling up language models greatly enhances task-agnostic and few-shot jobs.
Here’s the piece that May concern people in terms of ethics – GPT-3 can easily produce samples of articles that people will have a difficult time identifying as fake news. In the world of today, that could be very disastrous. To be fair, OpenAI addressed this problem with the white paper.