OpenAI is moving fast and extending their lead.
OpenAI held their first DevDay, with the level of buzz and anticipation formerly reserved for Steve Jobs iPhone announcements.
Key notes from the keynote:
GPT-4 Turbo with major feature improvements and updates to the foundation model, at less than half the price of the previous top-of-the-line model.
API improvements including the Assistants API, which builds some of the most essential and popular capabilities of the LangChain and LlamaIndex frameworks directly into the OpenAI API.
GPTs: Anyone can build a custom chatbot with access to an external toolkit like plugins, its own knowledge base, and system prompts. Any product can come with its own chatbot for docs, question answering, and to perform API tasks via voice or natural language prompts.
Foundation Model Enhancements
- GPT-4 Turbo is now generally available in beta as gpt-4-1106-preview, and is driving the main ChatGPT chat UI.
- 128K context window, up from 32K
- Trained on knowledge up to an April 2023 cutoff
- Less than half the price of GPT-4 8K context, and one-quarter the price of GPT-4 32K token context: $0.01 per 1,000 input tokens and $0.03 per 1,000 output tokens, vs. previous $0.03 and $0.06 (8K context). It’s quite remarkable to get a 2-page summary of a 200-page document for about $1.
- Doubled tokens per minute rate limits (Check your rate limits).
- Multi-modal inputs: images.
- Updated Whisper V3 text-to-speech model. (An aside: The Sky voice in the ChatGPT mobile app is definitely Scarlett Johansson, and you can’t convince me otherwise.)
- Function calling update: One API call can return multiple function calls in response to a prompt, such as, ‘look up a track in Spotify’, ‘play the track’
- JSON mode will always return valid JSON. Previously telling GPT-4 to return JSON was hit-or-miss. But you could trick it into returning valid JSON by inventing a fictitious function signature for it to call.
- Reproducible outputs by specifying a random seed. The previous model was stochastic and you would typically get a different answer each time.
- Return probabilities for completions, which is useful for knowing how certain the model is of its answer, or for providing multiple alternatives for autocomplete.
- Fine-tuning GPT-4 using your own content and tasks is now available in experimental mode on request.
- Custom models - at a substantial cost (like seven figures), OpenAI will develop a custom GPT-4 based on data and training parameters you specify.
- Copyright Shield: OpenAI will indemnify enterprise users and app developers against liability for infringement claims based on the usage of OpenAI APIs.
There is an emerging stack for LLM applications. Beyond OpenAI’s question-answering API, developers may need additional functionalities like conversational memory, embedding storage for retrieval-enhanced generation, logging / observability, prompt template / library management, and chat UI implementations. LangChain and LlamaIndex are two important frameworks that provide these layers and facilitate integrations with other DBs, software layers, SaaS services, and APIs. These frameworks offer a great roadmap for how to implement complex LLM flows and if you don’t have a passing familiarity with them, you’re probably not a serious LLM developer.
But in most cases, the real magic is performed by the underlying calls to the LLM itself, e.g. OpenAI. And for simplicity and stability you are often better off writing directly to the OpenAI API, like you might be better off using a simple JAM (JS, APIs, Markup) web dev stack vs. a full-blown MERN framework (MongoDB, Express.js, React.js, Node.js).
With the latest update, OpenAI has borrowed a page from LangChain and LlamaIndex, and integrated higher-level functionality, further up the app stack / value chain, directly into the OpenAI API.
- Assistants API and playground.
- Persistent infinite threads - OpenAI handles conversational memory.
- Code interpreter - OpenAI handles calling functions in a sandbox. When you use the function calling API to describe functions and when to use them, and then a prompt returns a function signature to call, OpenAI can run the function in a sandbox (for $.03 per sandbox per hour).
- Retrieval - Add documents to the context for prompting, and OpenAI handles retrieval-augmented generation (RAG).
- An alternative way of viewing the Assistants API is that it exposes the capabilities of the ChatGPT Advanced Data Analysis dropdown to devs in the API.
- 4 new model APIs have been introduced:
- Image input is in beta in the gpt-4-vision-preview model and is expected to be in the final GPT-4 release.
- DALL•E 3 image generation is available via the dall-e-3 API.
- Text to speech is available via tts-1 and tts-1-hd models. ElevenLabs was previously one way to do this, but these models are cheaper and likely superior.
- Speech to text with the updated Whisper v3 model is “coming soon” to the API.
The previous ChatGPT dropdown menu, which required choosing between different GPT-4 toolkits, has been phased out. (It offered plugins, or web browsing, or advanced data analysis with the code interpreter and the ability to upload documents, or DALL•E 3). The default GPT-4 Turbo model now has access to all those functionalities.
But now, any developer can create a custom GPT chatbot with access to the data, toolkit, and system prompts of your choice, and publish it in the GPT store. The feature dropdown is dead, long live custom GPTs!
Going forward, we can probably expect many products to come with a GPT to teach you how to use it, answer questions from the documentation, and to talk to its API to perform tasks via natural language and voice commands.
- Impressive keynote. Didn’t announce GPT-5 or that AGI has been achieved internally, but shows rapid maturation of the product and of OpenAI as an organization.
- Changes the environment for a number of players: Anthropic, ElevenLabs, LangChain, LlamaIndex, and vertical RAG startups.
- API maturing rapidly.
- Expect an explosion in niche-specific GPTs.
- It’s a little weird that the battle right now is pretty much between OpenAI’s closed SaaS model and Meta’s open-source Llama 2 model, which is stuck around the GPT 3.5 level. Google is running out of time to be a fast follower. They could still be an effective competitor, but it’s coming up on a year since GPT 3.5, so no one can accurately describe them as ‘fast’. Still no general availability Bard API, although NLP transformers are pervasive in the various Google products and they recently added some nice LLM summaries to search results.
As an aside, I wrote this mostly in Obsidian, which is a note-taking app, similar to Notion or Evernote. It is Markdown-based, and functions well as a Markdown editor. It has a large community with tons of plugins, notably an OpenAI plugin which lets you use ChatGPT to copy edit, or send content to custom ChatGPT prompts. Or you can use other LLMs and even a local Llama instance.