Tiny Stories

Using small resources to building AI models

There is a lot to say about LLMs and AI-powered content. Running the models as large as they are is one major factor in keeping them expensive to run. The idea behind Tiny Stories is How Small can Datasets be before an LLM no longer writes coherent English, or writes in the language of your choice.

The datasets used were very small in size compared to the Large Language Models commonly used. Those LLMs are of course those that undergird Anthropic’s Claude and ChatGPT, among others.

The new constrained dataset, Tiny Stories, is for analyzing core AI language capabilities. Researchers created this focused corpus of short, simple stories generated by language models using just a basic children's vocabulary.

The goal is to better understand the fundamental abilities required for coherent text generation by limiting the scope of the task. Read on to learn more about this fascinating AI experiment!

The Challenge of Coherent AI Text Generation

Current state-of-the-art language models still struggle significantly with producing coherent long-form text, especially smaller models. When trained on massive, diverse datasets of actual text, they seem to get overwhelmed by the breadth of information.

This raises questions about the root causes of incoherent AI text - is it an intrinsic complexity of language itself or the excessive diversity of training data that hampers coherence? What are the minimal requirements for generating consistent narratives?

The Tiny Stories Approach

To investigate these questions, researchers created the Tiny Stories dataset using vocabulary that would be familiar to a 3-4-year-old child. The vocabulary is restricted but aims to capture the core elements of language like grammar, facts, and reasoning.

Language models were prompted to generate a corpus of short stories of 2-3 paragraphs using only this children's vocabulary. By limiting the scope, researchers hoped to better isolate and evaluate fundamental language capabilities required for coherent storytelling.

Initial experiments with models like GPT-3.5 uncovered flaws like repetition and logical gaps in the generated stories. But they also exhibited some key successes like smoothly integrating prompts, reasoning about relationships, and maintaining a narrative arc.

Smaller models struggled more than large models, supporting the theory that model scale is crucial for coherent generation. The focused nature of the dataset allowed more precision in analyzing models' linguistic foundations.

The Tiny Stories method provides an interesting new avenue for unpacking language model abilities. You could potentially create a language model based on the writing of Stephen King, or your favorite author. You could create a language model based on your own writings and see how far it can get.

Researchers could also create similarly limited datasets targeted to specific domains like science, literature, or current events. This may better reveal the stages of acquiring domain knowledge and context.

The Tiny Stories experiment offers valuable insights into core language capabilities by providing AI models with a more defined storytelling task. While far from solved, pinpointing where coherence breaks down in this controlled setting moves us closer to building truly understandable AI systems.

What experiments would you like to see done with Tiny Stories or other constrained datasets? Let us know your thoughts!