I think it’s comparable in scale with the Industrial Revolution or electricity — or maybe the wheel. - Geoffrey Hinton

Any sufficiently advanced technology is indistinguishable from magic. - Arthur C. Clarke

GPT is a transformer so smart / That can write like a human or a bard / It can answer your queries / Or make stories so eerie / That you’ll wonder if it has a heart - GPT

Large Language Models like GPT-4 from OpenAI’s Generative Pre-trained Transformer series, which power Bing AI and ChatGPT, give computers the ability to understand and communicate with natural language at a near-human level. In most cases it’s hard to tell if text was generated by ChatGPT or by a human. This will have far-reaching implications.

Here’s a primer on LLMs and the latest versions of GPT.

History and Background

A Short History of Natural Language Processing (NLP)

Statistical language models go back at least to this 1950 paper by Claude Shannon, who invented the ‘bit’ and information theory.
If you play Wordle and H is the second letter, what is the first letter mostly likely to be? I’m going to go with S (T and C might be other choices, Q might not.)
When I was a kid on the way to school in the subway I would see this ad for secretarial school to learn speedwriting shorthand:
A reader can fill in missing characters and words based on context. If all characters were equally likely, ignoring punctuation and spaces, the amount of information in English text would be log2(26)=4.7 bits per character. In fact there is a lot of predictability and hence redundancy. Shannon found the true information rate to be between 0.6 and 1.3 bits per character, so you could probably compress English text to 1/4 the size, which is in fact close to what we find with Zip (73% reduction). The redundancy can be considered a form of ‘parity stripe’ that makes it easier to read error-free text.
At a far more complex level (that implicitly encodes a mind-blowing amount of human knowledge), what ChatGPT does is: statistically predict the probability of tokens (words or word fragments) and generate sequences of words based on the likelihood of being appropriate in context. This turns out to be unreasonably effective.
In the 1980s and 90s there was a lot of NLP work on taking sentences and
- Tagging parts of speech
- Stemming and lemmatizing words (dealing with suffixes like -ing and -ed)
- Building syntax trees
- See e.g. the Python NLP modules and An Introduction to Part-of-Speech Tagging and the Hidden Markov Model
In the 2010s, users of Google Translate were shocked at a sudden improvement. Translations were suddenly much more natural. Google had switched to an end-to-end machine learning model.
A key NLP concept is the embedding. We represent each token (word, or word fragment) as a vector of 100s to 1000s of floats (12288 in GPT-3). The embedding represents a word by coordinates in a latent space that tell you something about how it relates to every other word. With an approach like word2vec, you can train an algorithm to predict the context each word is found in, or what words are nearby. Simultaneously with training the prediction algo, you train the embedding that best represents each word. You discover embeddings that encode surprisingly deep meaning purely based on context. For instance, if you take the vector for ‘France’ - ‘Italy’ + ‘Rome’, the closest word to that vector in the latent space represents ‘Paris’. Or ‘dog’ - ‘puppy’ = ‘cat’ - ‘kitten’. It turns out that in order for an embedding to enable the best possible prediction of what context a word appears in, it has to capture all the word’s varied meanings, connotations, and grammatical properties.
Embeddings are learned in the context of a corpus of data, a model, and a task. If you train on technical documents, you might get a different embedding for ‘server’ than if you trained on restaurant industry publications; if you use self-attention, you will learn embeddings that know ‘cross’ should lead you to pay attention to the next noun, especially if it’s a word like ‘road’; and you might learn slightly different embeddings for a translation task that tries to capture everything vs. for sentiment analysis that really wants to hone in positive v. negative connotations.
After we train our embeddings, we can represent a sentence fragment like “Why did the chicken cross the” using a sequence of token embeddings, and our model trained in tandem will be able to predict that the next word might be ‘road’. In fact, we can now generate sentences. We can sample a distribution of the most common starting words. Then using our model, we predict the probability of each token as the next word, based on the sequence so far. We can sample a word from that distribution, and repeat until we reach an ‘end of sentence’ token.
Embedding or projecting an object into a latent space by training an embedding model is an extremely important concept in machine learning. If you are an investor, you may be familiar with factor models. If you calculate the beta of a stock relative to the S&P, you are projecting the stock into a 1-dimensional latent space. If you calculate the stock’s betas relative to 10 factors, you are projecting it into a 10-dimensional latent space. If you take any two stocks, their projections into these latent spaces concisely describe aspects of how their behavior relates to one another. Factor models are linear projections, but for deep learning applications we use nonlinear projections using neural networks. Some have applied deep embeddings successfully to financial markets, see e.g. Cong et al..
Another example of the use of embeddings and latent spaces is image style transfer:
- Input a photograph
- Input an Impressionist painting
- Train an algorithm to apply the style of the painting to the photograph
  - We project the macro features of the photograph into a latent space representing the fact that it shows a haystack in a green field under a blue sky.
  - We project the micro features of the painting into a latent space representing micro features like palette and brush strokes.
  - Then we tell the algorithm to find an image that is close to the photograph when projected into the ‘macro feature’ latent space, while also being close to the painting when projected into the ‘micro feature’ latent space. Finding the image that minimizes the sum of those two distances yields an Impressionist painting-like image of a haystack under a blue sky.
- Similarly, when we tell a generative text-to-image program like Stable Diffusion to generate “Haystack in a green field under a blue sky in the style of Monet”, we project the sentence into a latent space and find the image that would have the closest projection in the same latent space.

A Short History Of Deep Learning

A lot of theory and empirical work on NLP was suddenly rendered moot by the far better-performing ‘statistical’ ML models. This is a recurring theme in machine learning research, the bitter lesson. Humans trying to make thoughtful models of how to perform complex tasks have been outperformed repeatedly by conceptually simple models backed by massive computation. Otherwise stated, “The algorithm knows best” and “GPU go brrrr.”
Neural networks had been around since the 40s and 50s. But research was pretty dead by the mid-80s. They were considered historical artifacts and no one could get any funding to research them in the US. But in 1986 Geoffrey Hinton, after moving to Canada, popularized the backprop algorithm to train them more efficiently by considering a neural network as a chain of functions and differentiating the entire neural network with respect to its output. Using this derivative, you can then update each parameter in the direction that improves the output and optimize by stochastic gradient descent. Then, computers got more powerful and datasets got bigger, and in 2012 Hinton’s student Alex Krizhevsky achieved state-of-the-art (SOTA) image recognition with AlexNet. Since then, deep learning using neural networks has gone supernova, with better GPUs, better software architectures, scaled up to train on bigger data, leading to better results, driving more investment in hardware, software, and research, perpetuating the cycle.
Transformers are the current SOTA architecture for dealing with sequence to sequence tasks, like translation and question answering. They were invented by Google researchers in 2017 (None still work at Google, one works at OpenAI). Key concepts:
- A transformer typically takes a sequence of vectors and outputs a similar-sized sequence length.
- A transformer doesn’t process tokens sequentially, it looks at e.g. a whole sentence or text fragment at a time.
- An attention mechanism is a neural network architecture we can train to predict the importance of each token in generating an output.
- Multi-headed attention learns multiple attention models to determine the importance of each token in different tasks. For instance, a local context vs. importance in the whole sentence, importance on multiple dimensions of the output.
- Self-attention learns how much each token in the input relates to every other token in the input.
- Transformers can be stacked. The first seq2seq transformer layer outputs a sequence that is an intermediate representation that goes to another layer, and so on.
- Transformer architectures now dominate most language tasks and image and video processing. For e.g. sentiment analysis, you might train 1 or more transformer layers, then run a feedforward neural network that maps the output sequence to a sentiment value. For translation, summarization, or a chatbot, you might train several transformer layers, then a feedforward neural network that encodes the input in a high-dimensional latent space representation, then run a decoder that maps the encoding to a sequence of tokens that translate, summarize, or respond to the input.
- Typically we follow a transformer with a feedforward neural network. After we ‘encode’ or try to understand the input, we might follow with additional layers to output a sentiment score, or an output for some other task.
- Additional references:
  - Beautifully Illustrated: NLP Models from RNN to Transformer
  - A Deep Dive Into the Transformer Architecture – The Development of Transformer Models
  - Peter Bloem, Transformers From Scratch. Defines the transformer as “Any architecture designed to process a connected set of units—such as the tokens in a sequence or the pixels in an image—where the only interaction between units is through self-attention.”
  - Alexander Rush, The Annotated Transformer. Annotates the original landmark paper.
  - Lilian Weng, The Transformer Family Version 2.0 for more on the transformer family and variations.
  - Some GPT-3 specs:

A Short History of LLMs

ELIZA is sometimes cited as the first successful chatbot and a precursor of LLM chatbots, but it has little in common with them. It was more of a Mad Lib template engine that would parse key words and try to put them into some sensible-looking response. It didn’t perform a task we would recognize as machine learning in the sense of trying to learn or minimize a loss function.
Evolution of LLMs:
- GPT-1 (June 11, 2018) : 117m parameters (OpenAI)
- BERT (October 2018): 340m parameters (Google; widely used throughout Google’s suite of products)
- GPT-2 (February 14, 2019): 1.5b parameters
- GPT-3 (May 28, 2020): 175b parameters (2 bytes per param trained as 16-bit float = 350GB just to store params).
- LaMDa (March 2022) - 137b parameters (Google)
- GPT-3.5 (November 2022) Parameters: ??? A modest update to GPT-3, went to context size of 8k and 32k tokens. This was launched as ChatGPT and set off all the hullabaloo.
- GPT-4 (2023): Powers Bing AI. Parameters: ??? OpenAI is now ClosedAI, no longer provides algo details. Asking Bing AI itself, guesses range from 300 billion to 1 trillion. Based on the long lag between GPT-3 and GPT-4, and the existence of other 1-trillion-parameter models like Megatron, I would initially guess at the higher end. OpenAI doesn’t mess around, maybe more likely a 10x increase than a 2x increase. On the other hand, an Nvidia A100 delivers 312 teraFLOPs, so would generate about 1 token per second on a 312-billion-parameter model. Empirically I see 200-word answers in ~20 seconds from Bing AI (GPT-4), 300-word answers from ChatGPT (GPT-3.5) in ~ 1 minute. I’m not sure if you can or would want to use a lot of A100s per end user (list price $8K plus 300 watts of power, although Microsoft buys both pretty cheap). So 1 A100 and 2s per token might put an upper bound around 624b parameters. Since they may be focusing more on go-to-market than major model improvements, plus GPT-3 seemed big enough to do a good job, that would argue for closer to 2x GPT-3.
- Facebook LLaMA (2023): 65b (biggest model)
- Alpaca (2023): 7B parameters (similar to LLaMA small model) in an open source model you can train from scratch on your desktop in a few days.

ChatGPT Overview

Training

Uses large web scrapes (these numbers are from GPT-3 paper):

Data set	Size	%
Common Crawl	410 billion pages	60%
WebText2	19 billion	22%
Books1	12 billion	8%
Books2	55 billion	8%
Wikipedia	3 billion	3%

It has GitHub and other code repos, tech paper just says ‘data licensed from third-party providers’
- many languages
- top websites
Training procedure:
1. First, tokenize our training corpus. GPT uses about 50,000 tokens at the subword level. In practice, small common words get their own token, and a word like ‘unpythonizable’ will get split as ‘un-python-izable’. This provides a useful optimization from the ~500,000 words in Webster’s, before lemmatization to reflect conjugations, etc.
2. Next, train our embeddings and our initial completion model. Initialize all our embeddings randomly. Set up our deep learning network architecture and initialize the parameters randomly. Create batches of samples from the text and attempt to predict the next token’s embedding. Initially the predictions will be random. After each batch, determine the gradient of each parameter that would improve the predicted embeddings and make them closer to the true embedding. Do this a few billion times. At the end of this process, we have good embeddings that encapsulate the meanings and connotations of each token, and a neural network that can complete a sentence like ‘The author of A Tale Of Two Cities is’ with ‘Charles Dickens’. So effectively it builds a knowledge graph. Low levels of the transformer network connect tokens, and higher levels connect higher-level concepts learned from the low-level connections.
3. Train our chatbot model to generate natural, complete, correct answers. Have a team create a corpus of training data in the form of good prompts and answers. Further train in a supervised learning mode to take the questions in this training corpus, and generate answers similar to the ideal answers provided by the expert team. This is called imitation learning, where a reinforcement learning algorithm learns to imitate experts.
4. Run our algorithm on novel questions and have teams rate the answers. Using these ratings, we train a new scoring algorithm that can rate answers to prompts. Now we train GPT using our scoring algorithm as its objective. For scoring, we can leverage a similar architecture to GPT, but now we take a question and the answer, and output a score. So the algorithm is almost training itself, and improving via ‘self-supervised learning’ or ‘Reinforcement Learning from Human Feedback’.
5. As we continue to gather new answers, we can have humans rate those answers, and continuously improve our scoring algorithm in tandem with GPT itself. This idea of training an objective-maximizing algo and some kind of evaluation algo side by side as in actor-critic reinforcement learning or adversarial generative networks is a key to the recent successes of generative AI.
6. We can use our training baselines, and try new architectures and hyperparameters, training for 1/10,000 of the full training time to generate a solid prediction of whether the new version will be an improvement. And of course we can automate this process to try many variations of architectures and hyperparameters. It’s models upon models all the way down.
Image from OpenAI
LambdaLabs estimated $4.6 million US dollars to train, and 355 GPU-years.
Industry sources i.e. Twitter rumors say GPT-5 is currently training on 25K Nvidia A100 GPUs, each one sells for around $10K, so $250m worth of hardware.
Selected sources:
- The GPT-3 Architecture, on a Napkin
- Quickchat blog
- Stephen Wolfram
- The GPT papers: GPT-2, GPT-3, GPT-4
- BloombergGPT paper

Inference (running the model)
1. Take the prompt, tokenize it into the GPT-native embedding scheme.
2. The first part of our neural network stack ‘understands’ the prompt and maps it into an encoding in a latent space.
3. The second part of our stack outputs the probability that each vocabulary token is the next one in the answer sequence, conditioned on the question encoding created in the previous step, and the tokens output so far. Sample this probability distribution to create the next token and return it to the user.
4. Possibly (not sure) rerank them for factuality, safety, and other criteria. Possibly it might decide at the beginning if the question is bad and short-circuit before even trying to answer it, or at the encoding stage or in the middle of decoding, if all the answers look bad it might bail out and give one of those answers like, “I can’t do that, Dave”.
5. Map the sampled embedding to the corresponding human-readable token and return it to the user.
6. Go to 3. Keep generating tokens based on the prompt and the output so far, until an end token is generated.
Performance
- On one level, ChatGPT is just predicting a statistically likely sentence to appear in a context, like a response to a question. But in order to do this, it has to implicitly learn a great deal of knowledge. If you want to predict the next token from scratch, you must first predict the universe.
  - Rules of grammar of English and Python and other languages.
  - Idiomatic expressions. People say ‘my big fat Greek wedding’ but not ‘my Greek fat big wedding’ for reasons that aren’t always clear even to me as a native English speaker. Sometimes there may be rules for idiomatic language, sometimes you just have to learn n-grams a/k/a sequences that usually go together.
  - Elements of style, punctuation.
  - Facts like
    - Sun Tzu wrote The Art of War. I am not really sure how this knowledge is encoded. Digressive speculations: Knowledge graphs and transformers are conceptually related. A token’s embedding tells you what other tokens you might want to pay attention to when you encounter this token. You can presumably write a function that takes the embeddings for “Sun Tzu” and “The Art of War” and returns how they are related, like “author/written by”. You can just ask GPT “describe in 10 words or less the relationship between “Sun Tzu” and “The Art of War” and get a good answer. Now it’s possible that the knowledge graph is well represented by ‘smart’ embeddings, and a ‘dumb’ function can take the embeddings and tell you that ‘Sun’ and ‘Tzu’ can form an entity which represents an author; and ‘The Art of War’ represents a book; and the entities have an ‘author/written by’ relationship. And it’s also possible the knowledge graph is represented in a ‘smart’ function, and the function would have to do a lookup on ‘dumb’ embeddings using a data structure embedded in the function. The latter seems more likely based on the number of parameters in the embeddings vs. the whole model. But somehow, the initial training creates knowledge graphs relating tokens and higher-level concepts, and will complete “The Art of War was written by” with “Sun Tzu”.
    - Ask it to write a limerick, it knows that a limerick has 5 verses, an aabba rhyming scheme, verses 1,2, 5 have 3 anapests, verses 3 and 4 have 2 anapests. Also it knows how words are pronounced, what rhymes, syllables and emphasis so each verse scans.
  - Patterns of building chains of reasoning.
  - Understanding questions and what makes a response relevant.
- GPT-4 outperformed many / most humans on a variety of standardized tests. GPT-4 was not trained specifically for these tests, although some questions were seen in training. It performs well in many languages including uncommon languages like Latvian, Welsh, Swahili.
  - bar exams
  - SATs - 700 math, 710 verbal
  - GRE
  - USA Biology Olympiad
  - Scored 5 on AP Art History, AP Biology, AP Environmental Science, AP Macroeconomics, AP Microeconomics, AP Psychology, AP Statistics, AP US Government and AP US History
  - US medical licensing exams
  - Leetcode assessments
  - Sommelier written test
  - Failed the ethics portion of CFA level 3, per the FT
  - Got a 2 in AP English and 54th percentile in GRE Verbal and did not improve from GPT-3. Also not great on American Math Contest 10th grade math.
GPT-4 vs. GPT-3
- Larger maximum token limit: 32K v 4K
- Multi-modal, can take image inputs. For instance, you can ask it to describe an image (quite poorly in my experience)
- You can ask it to summarize a web page, it will go out and fetch it and summarize, or ask it ‘summarize today’s news’
- Improved answers to many queries. OpenAI says users preferred 70% of answers vs. ChatGPT-3.5.
- Plugin architecture, integration so it can go out to WolframAlpha, get domain-specific info for math.
- Cites sources.
- Better blocking of harmful queries.
Limitations
- Cost. The knowledge graph gets created at the first stage, and so it has to be retrained to learn a new set of facts. We can also ask the model to summarize text contained in prompts, and potentially fine-tune the chatbot algorithm to provide responses in a desired form, without full retraining. However it is not perfectly straightforward to apply the chatbot algorithm to a new body of text like internal documentation or help desk logs. Alternatives in order of complexity:
  
  1) Retrieval-enhanced ML, query a search engine and send excerpts to the OpenAI API to summarize in a natural language response. See for instance LlamaIndex, which orchestrates Pinecone and LangChain as a natural language query solution.
  
  2) A fine-tuning API - train ChatGPT to answer prompts better - can combine with 1)
  
  3) Plugin API (currently needs invitation from OpenAI)
  
  4) Incrementally train an existing model, which may be possible with some LLMs like Google T5 Flan. Of course GPT-4 is the most advanced, but the source code to train on our own data is not available from OpenAI.
  
  5) Train an LLM from scratch. Costly, only possible on smaller older models compared to GPT-4. There may be SaaS services that run LLMs released under open source licenses and enable you to train from scratch.
- Reasoning errors. I don’t think ChatGPT understands anything dynamic. It doesn’t build a causality model to predict real-world processes unfolding. It knows Python syntax, but it doesn’t know what a program will do. I’m skeptical it can solve Leetcode mediums unless it has seen the solution in training. If it has seen something similar in training, you’ll get a good answer, otherwise caveat promptor. It doesn’t connect the dots on its own or generalize using high level abstractions. See the math problem below. first leg: 20 miles, 60 mph, time 20 minutes. second leg: 15 miles, 45mph, time 20 minutes. third leg: 10 miles, 30mph, time 20 minutes. total: 45 miles, 45mph, time 1 hour (I don’t know how it comes up with 0.833 hours)
- Hallucinations. if there are no high-probability answers, it will make up something plausible but fictitious.- The longer you talk to it the weirder things get.
- Overconfidence. Does not noticeably seem aware of how certain it is of what it says, always sounds confident even when wrong. Bing AI gives source links but it doesn’t amount to explainable AI that can give a chain of evidence.
- Does not learn from experience, not an online learning system.
- Harmful questions, how do I build a bomb, synthesize sarin, commit the perfect crime, beat my spouse and leave no marks. A red team was used to identify questions that should not be answered. There is a post-inference layer that short-circuits bad questions with “I cannot provide information on X”. Or disclaimers, like if you ask how to obtain cheap cigarettes, it will say smoking is bad but here are some options. But people can sometimes jailbreak with adversarial prompts. There are many tricky and dual-use questions to block:
- Bias - Sentences in Turkish don’t have gender for ‘doctor’ or ‘nurse’, when you translate to gendered language doctors are male and nurses are female. (ChatGPT seems to have recently fixed this, the below example is Bing Translate.) Search tools have been known to get worse when you ask questions about women, Black people, different nationalities. There is a lot of toxic stuff in the training set. There is an attempt to filter toxic output but it’s imperfect and can be jailbroken using adversarial prompts like DAN. Sometimes it goes too far, filtering out simple questions that might generate controversial answers.
Prompt engineering, or how to ask a question
- Zero-shot - just ask a question
- Few-shot - ask a question and provide a few examples
- Choice of prompt format, training examples, and the order of the examples can lead to dramatically different performance, from near random guess to near SOTA.
- It gets stuff wrong, so for research, never ask a question where you can’t validate the answer (no sweat if you’re asking it to write a limerick or something unimportant)
- Set the context
  - What type of info to use to answer a question
  - How you want it to solve the problem
  - What audience the answer is for
- Be as specific as possible. Keep it simple.
- Give examples with solutions
  - Tell it what to do, avoid telling it what not to do
  - You can describe output format, math format, tables, plots, links
- Tell it to comment every line of code or explain every reasoning step. “Let’s think step by step” and similar self-reflection prompt chains magically help.
- Check everything and follow up with questions
- Ask multiple times because it is a probabilistic system, you will get different answers; ask more than one LLM. You can even ask multiple LLMs automatically and then send the prompt to an LLM to collate and summarize?
- Follow common prompts and prompt assistants
A Universal AI?
- Might not be Artificial General Intelligence (AGI) yet. But, it’s trained on ALL THE THINGS. GitHub all the programming languages, Web all the human languages. Well, a lot, anyway.
- Any task can be described as an English question. Become good at answering questions = become good at all learning tasks = universal intelligence?
- Sentiment analysis - ask it to complete a review with ‘it was ….’
- Recommender system - can you recommend a movie or gift for someone who loves….
- Translate anything (chat.openai.com because Bing AI just says it’s calling Bing Translate)
- Summarize anything
- Write code: Copilot - programming tasks - doing pretty good … but probably only solves complex problems where the code or something close already exists in GitHub. It can fix problems with code and answer Stack Overflow type questions.
- Make all the collateral for a business website in a few minutes
- Copilot for MS Office coming soon - what are the trends in this data, please make a PowerPoint of the highlights.
- Security Copilot
- Language teaching / tutoring
- Listen in to customer support calls and pop up hints and suggestions and personalized script. If a customer is calling Chewy and their pet is named Darcy, sprinkle in Pride and Prejudice references.
- Answer complicated medical questions - Dr. Google, MD is now Dr. ChatGPT, or veterinarian.
- Personalized marketing copy, websites, and emails / spam.
- Can easily build chatbots to answer plain English questions about any body of text or data like a company wiki.

Future Directions and Impact

GPT-5? Speculations:
- Avatar like Siri on your phone, maybe photorealistic animation and voice of your favorite celebrity or man/woman of your dreams.
- Will learn from experience, including your interests and communication likes and dislikes like TikTok, deliver personalized banter. Basically the movie Her.
- Physical robots with natural language interactions, probably first on factory floors and commercial establishments.
- Incorporate conversational responses in Google and Bing and other products without going to a specific bot product.
- Better Copilot for Office, write/complete many emails automatically.
- Plain language customized help, tooltips as you work, your software will talk back.
- More integrations - Announced: Expedia, FiscalNote, Instacart, Kayak, Klarna, Milo, OpenTable, Shopify, Slack, Speak, Wolfram, and Zapier.
- Maybe better resistance to hallucinations, training with a vetted knowledge graph, Bayesian methods that explicitly incorporate uncertainty when formulating a response, qualifying answers, pruning low-probability confabulations.
- Every body of documentation will be queryable with plain English
- Receive a deluge of personalized, human-like conversational comms on email and social and text message platforms. I might say, you have to Turing test / Voight-Kampff test everyone you talk to online or on the phone, but what would even be the point?
Impact and issues: There is a dichotomy - helps humans but there are always undesirable side effects and risks.
- Economic impact. Will make a lot of creative tasks easier and democratize them. anyone can code, do graphic design, video editing, it removes language gaps, etc. but
  - Even more content gets generated, now slicker and more personalized, what will be even more scarce is attention.
  - Inboxes will get filled with highly personalized spam, social media will get flooded with even worse, more human trolling, a lot of human-created content will get driven out of the public sphere. Like in the movie AI, ads will squawk “John Anderson, you could use a Guinness right about now!
  - A lot of people will have a lot of their work automated, lose jobs.
  - Historically, tech advances didn’t hurt aggregate demand for labor. When cars emerged, buggy drivers became truck drivers. A few whip manufacturers and a lot of horses lost their jobs, but overall productivity and labor demand increased. However, if humans and robots are perfectly interchangeable in the labor force, economic theory doesn’t rule out human wages declining to what it costs to rent a robot, cheap robots building more cheap robots, and humans going the way of horses. Historically the elasticity of substitution between labor and capital has remained near 1 because humans always found higher-value stuff that machines couldn’t do. But that is an empirical observation, and there is no iron theoretical guarantee that humans can always adapt.
- Intellectual property - It generates text based on online sources, some of which may be copyrighted. Unclear if paying for stock photos is still a thing if you can make AI create any photo (more a Stable Diffusion / Midjourney issue). Since these generative AI creations are based on material created by humans, this raises questions about derived works, fair use, copyright infringement. Which of these concepts apply to a robot? Does the 1st Amendment apply? Someone will undoubtedly argue that the 2nd amendment applies and they can have armed robots patrolling their property.
- Monoculture - only huge companies may be able to create these AI models due to the brainpower, dev/training expense, infrastructure. If everyone is asking the same questions of the same models based on the same data, everyone will be learning and doing the same thing. Crowds go to the same places due to Instagram, similarly people will herd in financial markets, sciences, fashions, fads, moral panics, conspiracy theories.
- Overreliance - if people don’t verify answers and spread them, rumors, urban myths may spread. People will train these models based on chan boards and create conspiracy- and hate-driven chatbots. Adherents will say, the algo said it, it must be true.
- Bad Actors - OpenAI may put in some guardrails but people are going to train models for all kinds of bad and stupid purposes. In some cases stupid things like entertainment bots that tell you how to kill yourself, in some cases propaganda and trolling and fraud bots to find out about you and exploit your weaknesses.
- Privacy
  - Personal info in training data, like people might have leaked info they don’t want public, ChatGPT might make it even easier to find and harder to get rid of.
  - People putting personal info or sensitive corporate questions into ChatGPT, leakage. There was a serious bug showing other people’s queries, someone could have asked about themselves and then e.g. an STD or something. If that happens on corporate data and it’s sensitive or triggers a GDPR disclosure, it would be bad. If you code your firm’s secret sauce using Copilot, is any of that going to help improve competitors’ code? If you generate something using Copilot, could someone claim you copied it? If Copilot learned from GPL code, does your code fall under GPL?
  - Is it smart enough to guess someone’s email based on public info, like their initials and where they work, can you figure out the real identity of some social media handle?
- Security
  - Has potential to generate great personalized phishing emails, carry on long personalized conversations, generative AI can generate deepfakes of your boss calling you and asking you to wire money to a casino in the Philippines. TBH companies should probably have a ‘word of the day’ that a bot wouldn’t know to challenge anyone. Just don’t put it anywhere ChatGPT can see it, even on a post-it that’s in a picture on the Web. I searched my email for a password and Google returned an email with an image where it was visible, so now it’s in Google.
  - Code it generates might have security vulnerabilities and now everyone is cutting and pasting it.
  - Might be able to do ‘Google dorks’ to find security vulnerabilities
  - Changes what people will put online probably, if you don’t want it to become part of the Borg of universal knowledge, don’t put it on the public web. If ChatGPT can answer all the programming questions no one will go to StackOverflow with their question. No one will put photos in Shutterstock. Over the last 10 years there have been more walled gardens, convos taking place in Slacks, Substacks, private spaces. The golden age of ‘information wants to be free’ may be over.
- Misinformation. What is real anymore - easy to create super personalized content, with deep fake photos, audios, videos. See Synthetic Steve Jobs and the Pope’s puffer jacket.
- Other deceptive practices. Beyond fraud there are a lot of gray areas where if you are talking to a generative AI it should be disclosed and you should be able to opt out, or calls should be opt-in or outright banned. A bot pretending to be a real human or disguising who it represents should be considered fraud. There was an issue where bot assistants would call restaurants and they weren’t always happy about it. Certainly receiving calls and texts from chatbots should be opt-in and clearly disclosed.
- Acceleration. The race between e.g. Google, OpenAI, other participants to win the AI race, especially smaller companies and possibly international entrants, and of course bad actors who are geopolitical rivals of the current leaders, or just want to see the world burn, may result in cutting corners and putting tech out which poses big risks.
- Is it really intelligence? I don’t think ChatGPT has a world model, I have tried examples like this and they don’t work. Seems like a Clever Hans or a much smarter Eliza that can sort of BS on a huge amount of processed text. Raises the Chinese Room issue. If we do come up with algos that act human, are they sentient? Where does the sentience reside, if for instance we simulated the algo (slowly) on pencil and paper?
- Winners and losers - haven’t thought about too much
  - winner - msft, nvidia?
  - loser - shutterstock, stack overflow, adobe maybe, google maybe
  - Microsoft is looking like a big winner, they have an exclusive with OpenAI which is a nimble startup running circles around everyone with ChatGPT. Initially I would have said it’s kind of BS, Google has all the AI papers and market power, and GPT-3 is not the most state of the art … but long experience says first movers are first movers for a reason and keep winning a lot of the time. Plus they have Microsoft market power in their corner, they are going all-in with Copilot, Office Copilot, Security Copilot, it’s not a side bet, it’s a major strategic bet. Google is in their flop era similar to Ballmer’s at Microsoft. The Google+ pivot was a disaster, it’s hard to innovate at scale and it’s saddled with career risk. It’s easier to just try to find ways to goose ad revenue.

Conclusion

Most of the content from now on will be AI-assisted, writing, image, video, music/audio, and code.
Everything is going to be Siri/Clippy on steroids, your software will offer great suggestions and converse with you.
There will be crazy amounts of personalized spam, deepfakes etc.
Economic impact will be considerable.
Even Elon Musk, along with other eminent signatories, is calling for a pause, and he put Teslas on the road to teach themselves how to drive and got sued when AI crashed cars. He might not be wrong though. There are important public policies around AI to be decided.
May not be AGI yet but it’s more general that we’ve seen, it teaches itself, it’s going to explode in unpredictable ways.

A famous talking dog joke:

A guy is driving around the back woods of Montana and he sees a sign in front of a broken down shanty-style house: ‘Talking Dog For Sale.’

He rings the bell and the owner appears and tells him the dog is in the backyard.

The guy goes into the backyard and sees a nice looking Labrador Retriever sitting there.

“You talk?” he asks.

“Yep” the Lab replies.

After the guy recovers from the shock of hearing a dog talk, he says, “So, what’s your story?”

The Lab looks up and says, “Well, I discovered that I could talk when I was pretty young. I wanted to help the government, so I told the CIA. In no time at all they had me jetting from country to country, sitting in rooms with spies and world leaders, because no one figured a dog would be eavesdropping, I was one of their most valuable spies for eight years running… but the jetting around really tired me out, and I knew I wasn’t getting any younger so I decided to settle down. I signed up for a job at the airport to do some undercover security, wandering near suspicious characters and listening in. I uncovered some incredible dealings and was awarded a batch of medals. I got married, had a mess of puppies, and now I’m just retired.”

The guy is amazed. He goes back in and asks the owner what he wants for the dog.

“Ten dollars” the guy says.

“Ten dollars? This dog is amazing! Why on Earth are you selling him so cheap?”

“Because he’s a liar. He’s never been out of the yard.”

Here’s another talking dog joke, which is maybe a caution about interpreting ChatGPT as just a Mad Lib parlor trick…who knows what a trillion-parameter GPT-5 with a billion dollars of training behind it might achieve? Quantitative change becomes qualitative change and maybe sufficiently complex Mad Libs start to reflect knowledge, reason and wisdom?

May the forces of humanity-wiping-out AI become confused on the way to your house.