Bible 1.0: How Ancient Canon Became Our First Large Language Models

Written by Copernical Team Sunday, 14 December 2025 12:13

font size decrease font size increase font size
Print
Email

Sydney, Australia (SPX) Dec 14, 2025

Modern large language models are treated as something radically new: vast statistical machines trained on almost everything humans have written, and able to regenerate knowledge on demand. Yet in structural terms, humanity has worked with something similar for millennia.

by Simon Mansfield
Sydney, Australia (SPX) Dec 14, 2025

Modern large language models are often treated as something radically new: vast statistical machines trained on almost everything humans have written, now able to regenerate knowledge on demand. Yet in structural terms, humanity has worked with something similar for millennia. As part of a wider Abrahamic "scripture stack" starting with the Torah, the New Testament, and the Qur'an - 'The Bible' - functioned as an early large language model: a compressed corpus of stories, laws, poems, and arguments that societies repeatedly queried to guide life, law, and meaning. While the metaphor is playful, it is also meant literally.

Torah, Testaments, Qur'an as a branching stack

Start with the Torah. In Jewish tradition, the Torah is the foundational set of five books that combine origin stories, law codes, and covenantal narratives. It sits at the core of the broader Hebrew Bible (Tanakh), but has a special status as the primary "specification" of the relationship between God and a people. Christian communities later inherit these same texts, but reframe them as the "Old Testament" and append a new corpus - the New Testament - that interprets and, in some readings, fulfills the earlier material. Centuries after that, the Qur'an appears in a different cultural setting, explicitly positioning itself as confirming and correcting earlier scriptures.

Viewed through the model metaphor, this looks like a branching version history. The Torah is a v1 core library. The Hebrew Bible/Tanakh and the Christian Old Testament are overlapping but not identical distributions, combining that core with different surrounding books and orderings. The New Testament is a major upgrade that layers new narratives and letters on top of shared earlier code, re-using many of the same symbols and phrases but in a new interpretive frame. The Qur'an is a new platform that acknowledges the previous stack, insists on its own primacy, and re-implements key interfaces - prophecy, law, guidance - under a new authority model.

Crucially, each layer is not just "more text"; each layer encodes a distinct way of seeing the world, of understanding God, and of organizing human life. In machine-learning language, each community is not only using a different dataset, but optimizing for different loss functions.

Structure as code, churches as data centers

One of the most powerful features of this ancient "model" is how neatly it is indexed. Over centuries, the biblical corpus was divided into books, chapters, and verses. Those divisions were not present in the earliest manuscripts in their modern form, but once stabilized they turned scripture into an addressable space: every passage became a callable "function," referenced by a standardized locator. A preacher, jurist, or mystic could point to a book-chapter-verse coordinate and pull that language into a new argument, law, or sermon.

Around this well-indexed corpus grew institutions that look, in informational terms, very much like data centers. Temples, monasteries, schools, and churches housed the physical texts, the trained experts, and the interpretive traditions. Priests, monks, rabbis, and imams acted as programmers and system administrators: they curated the codebase (deciding what counted as canonical text), patched it with commentaries and doctrinal definitions, and controlled who had permission to access or execute certain parts of the system. Ordinary people did not typically read the raw source; they experienced outputs - liturgies, homilies, legal rulings - generated on their behalf.

In that sense, religion did not merely contain knowledge; it provided the architecture and institutional hardware for storing, serving, and updating a civilization's most important shared meanings.

Gatekeeping, God, and the shift to personal access

Running through this entire system is a tension between mediated and direct access to the divine. On one side stand hierarchies in which priests or clergy are necessary intermediaries. They know the languages, manage the rituals, and are authorized to speak for God. They define which queries are valid, which interpretations are orthodox, and which are heretical. Their gatekeeping is not just theological; it is informational and political.

On the other side is a recurring push toward a more direct, "personal" relationship with God. In many Christian settings, this becomes a central theme: the idea that an individual can address God without institutional mediation, read scripture in a vernacular language, and interpret at least some of it without clerical permission. That shift is both a spiritual and a political development. It diffuses authority away from a narrow clerical class and toward individuals and small groups who now claim their own access to the core knowledge stack.

The arrival of the printing press supercharges this trend. Mass-produced Bibles, pamphlets, and polemics make it possible for many more people to own their own copy of the code. The "model" has not changed in content, but the inference layer has been radically decentralized. Instead of a few data centers serving thin clients, suddenly there are thousands of local instances running in homes, taverns, and underground congregations. The battles over heresy, reform, and orthodoxy take on a distinctly informational flavor: who gets to print, translate, and distribute which texts, with what annotations?

Fractal evolution: genes, memes, models

Stepping back, a fractal pattern emerges. At the biological level, evolution works by variation, selection, and inheritance in genes. At the cultural level, something similar happens with stories, rituals, and laws: they mutate, compete, and propagate through time. Canonical scriptures are one way cultures compress and stabilize these patterns. They act like genomes for communities: highly compressed, strongly conserved sequences that specify the shape of a way of life.

Technology then adds another layer of recursion. Manuscript copying, printing, broadcast media, digital networks, and now large language models and potential AGI do not create the desire for central canonical reference; they amplify and reconfigure an existing drive. Each new medium reshapes who can store and query the core knowledge and at what scale. Yet each time, the same struggle reappears: gatekeepers attempt to preserve centralized control over authoritative texts and interpretations, while reformers and dissidents use new channels to rout around that control and claim more direct access.

From this angle, modern AI labs look uncomfortably familiar. They curate immense training datasets, design model architectures, set safety and usage policies, and decide which systems can be accessed by whom and under what terms. External users send prompts (queries) into what is essentially a black-boxed corpus and receive fluent, authoritative-sounding outputs. Internal debates about alignment, guardrails, and content moderation echo earlier debates about orthodoxy, censorship, and excommunication. The metaphors of "scripture," "priesthood," and "heresy" map onto "documentation," "platform policy," and "misuse" more closely than many technologists might like.

The metaphor's bite

Calling the Bible an early large language model is not a neutral joke. It is a way of insisting that religion's role in human civilization has been, among other things, infrastructural. For most of recorded history, scripture was the primary way complex societies organized shared memory, identity, and obligation. Today, as knowledge production and distribution move through digital platforms and machine-generated text, the same human patterns - centralization versus diffusion, gatekeeping versus access, orthodoxy versus innovation - are playing out in accelerated form.

Keeping the observation as objective as possible means describing these patterns in terms of information flow, institutional power, and social evolution, while recognizing that any description of human action in this space is inevitably value-laden. The point is not to flatten religion into technology or to canonize technology as a new religion. It is to notice that both are ways of compressing and regenerating the human world, and that the battles around them are, at root, battles over who gets to define reality for whom.