Introduction to Open AI – Codex

OpenAI Codex is an Artificial Intelligence system that was created by San Francisco-based AI research lab – OpenAI. Codex, which is the model that powers GitHub Co-Pilot, handles code generation via Natural Language Processing (NLP). Currently Codex is in closed beta phase and can be accessed through an API which is limited. OpenAI Codex is based on GPT-3 which had more focus on natural language than coding.

AI

According to OpenAI team, Codex is supposed to be an advanced version of GPT-3, focusing mainly on coding. GPT-3 is mainly a neural network that is trained on natural language, Codex has undergone an additional training on 159 GB of Python code from approximately 50+ million GitHub repositories that are publicly available. OpenAI claims that, although Codex is best suited for Python, it is well equipped to function in over a dozen programming languages as well:

  • Go
  • JavaScript
  • Perl
  • PHP
  • Ruby
  • Shell
  • Swift
  • TypeScript

OpenAI Codex can be used for a wide range of functions like:

  • Generation of code through comments
  • Code completion
  • Generation of comments in existing project base.
  • Refactoring code
  • Translation of code from one language to another

A complete list of examples can be found here: https://beta.openai.com/examples

Available Models

There are two models currently available under OpenAI Codex engine:

  • davinci-codex – This is the model with the most capability and supposed to be good in translating natural language to code. Supports up to 4096 tokens in beta phase.
  • cushman-codex – This model is the fastest one in terms of the speed in which the requests are processed, however this is not as capable as davinci-codex. Supports up to 2048 tokens in the current beta phase.

Accessing Codex API

In the current beta phase, OpenAI Codex can be accessed through:

  • API – API keys can be generated and used from your development environment
  • Playground – OpenAI team has provided a User interface in which one can set the required parameters and consume the API which is very helpful

Playground-OpenAI-API

Training

Codex has provided a way to train the model where in, the expected output to a problem statement can be passed thereby training the model. As per OpenAI team, the more training examples you have, the better. It is recommended to having at least a couple hundred samples. In general, only a doubling of the dataset size leads to a significant increase in model quality.

Guidelines

OpenAI is not open source and every project that is being built will be evaluated separately by the team. And those applications that go through will be able to consume the APIs for public use. You can read more on the guidelines here: https://beta.openai.com/docs/usage-guidelines/use-case-guidelines

What to expect

Although the Codex models have been trained in over millions of lines of code and still continuously learning with the help of the community, the results are not 100% accurate. However, OpenAI Codex doesn’t have any competitor that is remotely close to being capable of doing what Codex does today.

OpenAI has come a long way from being founded by Elon Musk, Sam Altman, and others in 2015 to getting a billion-dollar investment from Microsoft in 2019. With the release of Github CoPilot, the future of Open AI and Codex looks very promising.

Reference:

Looking forward to your comments.

Author Details

Roy Maria John

Roy M J is a Technology Architect with Infosys. He is a Digital Transformation Specialist associated with Digital Experience IP Platforms & Frameworks in Infosys. He helps in delivering digital transformation for large enterprises across the globe via Live Enterprise Interactions Suite and Digital Marketplace Platforms. He has rich experience in Web technologies primarily on the JavaScript stack. He is part of Cloud and Emerging Technologies track in the Digital Technology Council and is the Vice-Chairperson of TechCohere (Tech focus Group) in Infosys Thiruvananthapuram DC.

Leave a Comment

Your email address will not be published.