Skip to main content


Get ready to unlock the power of data at the Big Historical Data Conference

This year’s sessions promise an exhilarating dive into the world of historical data, where the past meets cutting-edge technology.

AI for historical and archaeological Big Data analysis

Session organiser: Jochen Büttner


The advent of artificial intelligence (AI) and its potential applications have set the stage for a revolution in the field of historical and archaeological research. With the vast and complex nature of Big Data in historical and archaeological content, the application of AI technology allows researchers to explore new dimensions and make sense of massive datasets that have remained unexplored thus far. In this session, we aim to highlight the most current and innovative AI-driven methods, tools, and discoveries that have the potential to reshape the practice of historical and archaeological research.

Topics for this session include but are not limited to:

  • AI techniques for mining historical and archaeological data
  • Machine learning for pattern recognition, anomaly detection, and time series analysis in Big Data
  • Application of deep learning for image and video analysis in historical and archaeological contexts
  • Natural language processing and text mining for analysis of ancient languages and historical texts
  • Predictive modeling and simulation of historical and archaeological processes
  • AI-driven data visualization techniques for historical and archaeological research
  • Exploratory data analysis and data-driven storytelling in historical narratives
  • Ethics, biases, and challenges in AI-driven historical and archaeological research
  • Case studies showcasing the innovative use of AI technology in history and archaeology
  • Cross-disciplinary collaboration between AI researchers and historians/archaeologists

Big databases: How large-scale archives can boost archaeological research

Session organiser: Martina Farese, Giulia Formichella, and Noemi Mantile


In recent years, numerous databases containing historical and archaeological proxies have been developed and made accessible through Open Access. These databases store various types of data, ranging from isotopic measurements to land use patterns and the spatiotemporal distribution of animals, among others. The availability of such large-scale data in a readily accessible format enables academics to conduct comprehensive research and explore a wide array of scientific questions. For instance, several databases have been recently developed to store bioarchaeological data, such as isotopes, from Mediterranean and European countries. These databases also include archaeological and biological information about the sites and samples, such as sex, geographical coordinates and radiocarbon dates. As a result, researchers can now investigate human lifestyles, including diets and mobility, in historical and prehistoric settings across multiple archaeological sites rather than focusing solely on one or a few locations. These databases also facilitate the study of human-environment interactions, agricultural practices, paleoclimatic events and animal farming. Furthermore, they can help identify data gaps, both geographically and temporally, thereby suggesting areas for future research.

In this session, we welcome contributions that describe databases designed to store large-scale data, particularly bioarchaeological proxies such as DNA, isotopes, morphology, and histology, and how these resources can aid researchers in their work.

Environments of big cultural heritage data integration

Session organiser: Mike Fisher and Dovydas Jurkenas


Cultural heritage is the world’s most evenly distributed resource, and efforts to preserve it globally have, over the past several decades, adopted digital means of data collection, storage, and analysis to great effect. Digitalisation can democratise usage and help overcome physical barriers to access, and it can expand the means for documentation, prioritising non-invasive measures of investigation. In facilitating generation of the often resultantly massive digital datasets, it also potentialises large-scale integration through semantic interoperability, globally scaling the available body of palaeodata. The housing and usage of such large datasets intersect with environmental constraints, but also enable us to understand how humans interacted with palaeoenvironments and how anthropogenic environmental change affects cultural heritage.

Papers given in this session will explore the methodologies, outcomes, and general potential for integration of big datasets, interaction between cultural and natural heritage, and bidirectional impact between big palaeodata and the overall environment. Each paper should describe a large dataset, the system used to manage it, data collection methods and methodologies, techniques used for integration with other datasets, and how the data either describe, enable analysis of, or are generated in consideration of the natural environment. Discussion should revolve around the broader theme of transregional and transdisciplinary data integration through digital methods such as Linked Open Data, semantic meta-modeling, or open-access data repositories. Considerations of how the FAIR (Findable, Accessible, Interoperable, and Reusable) data standard and CARE (Collective benefit, Authority to control, Responsibility, and Ethics) principles can support environmentally sound approaches to developing big cultural heritage data management systems are especially welcomed.

Leveraging Big Data, GIS, and machine learning in remote sensing

Session organiser: Manuel Peters and Mina Jambajantsan


The field of remote sensing is undergoing a data revolution with the advent of techniques like LiDAR, platforms such as UAVs and satellites, and increased access to new and historical datasets. This has resulted in a wealth of high-resolution geospatial data, presenting both opportunities and challenges for archaeological and historical research. In the era of digital data analysis and widespread use of programming languages, the integration of Geographic Information Systems (GIS) and Machine Learning (ML) has emerged as a powerful approach for extracting insights from large remote sensing datasets.

This session aims to explore the synergistic relationship between GIS and ML in data extraction, feature detection, and analysis, with an emphasis on automated object detection and classification in large-scale datasets. It will foster discussions on methodologies, tools, and applications that leverage GIS and ML to enhance archaeological and historical research using remote sensing data.

Key topics to be addressed include:

  • GIS-based data pre-processing and integration: Techniques for georeferencing, data fusion, and pre-processing remote sensing data sources such as satellite imagery, LiDAR, multi/hyperspectral data, and aerial photographs for effective analysis and feature detection.
  • Automated object detection and classification using ML: Exploration of techniques like Convolutional Neural Networks (CNNs) to detect and classify spatial data such as surface features, land cover/land use, and vegetation types.
  • Spatial analysis and geospatial modelling: Integration of ML algorithms with GIS tools to analyse spatial patterns, identify networks, and develop models related to land suitability analysis and locational preference.

We invite submissions of research papers and case studies showcasing the integration of GIS and ML for data extraction, feature detection and classification, and spatial analysis in remote sensing for archaeological and historical research. Contributions focusing on innovative methodologies, practical applications, and real-world examples are particularly encouraged.

Data and people in interaction: Network analysis for everyone

Session organiser: Vera Klontza and Barbora Ruffíni


Migration, trade, roads, various forms of contact and interaction represent major archaeological issues. These spatiotemporal relationships among individuals, communities, and regions were previously handled rather intuitively by archaeology. Various forms of network analysis have been used in archaeology over the last 15 years. However, these methods of excellent research potential are still relatively marginal in contrast to “static” databases and the models based on them. These methods are often used for the Mediterranean region in the late prehistory and antiquity. They are much less used for, e.g., continental Europe. These methods are more likely available to specialists in prominent universities and institutes, usually in Western Europe or the USA.

Our session aims to gather case studies addressing the potential of the methods, explaining the applied methods, techniques, and tools, and the latest views on network analysis. We want to challenge and encourage especially those who do not have access to these analyses yet, or are not familiar with the tools but gathered data from specific, lesser-known regions to present their ideas, questions, problems, and issues. We want our session to result in a network of interaction and collaboration between those who have experiences, the tools, the know-how, and those who have detailed data, the ideas, and the invention but lack the theoretical and practical possibilities. We strive to make our session not just presentational but truly interactive, to contribute to the decentralization of knowledge and technology, and to sharing and partnership across archaeology.

Big Zooarchaeological Data: Challenges and potentials for multi-scalar analysis across millennia

Session organiser: Angela Trentacoste, Jesse Wolfhagen, and Sarah Whitcher Kansa


Zooarchaeological remains are one of the most common types of material recovered from archaeological sites, making them a promising source for big-data approaches. By their nature, zooarchaeological remains have established biological nomenclature and taxonomies that underlie analysis, providing a shared structure that can facilitate data aggregation, comparison and discussion at vast regional and temporal scales. Faunal data do not suffer from the same ontological challenges present in aggregating other archaeological materials (e.g. ceramics) over the longue durée. Zooarchaeological data from millions of remains are already collected and digitized in individual and institutional databases; however, these data are rarely integrated beyond a regional level or published reproducibly. Consequently, the potential of one of the most abundant, comparable and well documented archaeological resources has yet to be fully realized. Should current challenges be overcome, the potential for new insights via big-data zooarchaeology is immense. Animals are a plastic and responsive proxy to the dynamic interplay between culture, biology, and the environment, imprinted on animal production strategies and animal physiology. This makes faunal remains a key – but underutilized – resource for examining ‘big debates’ on migration, climate, anthropogenic impacts, biodiversity, and sustainability, particularly in historical times.

This session invites papers that engage with the potential and challenges of working towards big-data zooarchaeology, in the form of case studies or positioning pieces on:

  • Critical assessment of theoretical frameworks and methods underpinning zooarchaeological analysis, data aggregation, and its integration with other studies (observation error, quantification, biometry, etc.)
  • Challenges of multi-scalar analysis and working from context, to site, to region, and beyond (e.g. site formation processes)
  • Multi-proxy studies that integrate zooarchaeological data with other archaeological and paleo-environmental data
  • Strategies that facilitate the creation of large, open, intelligible, and reusable zooarchaeological datasets
  • Approaches and collaborations to support and sustain zooarchaeological data sharing and integration

Archaeological time-series: the quest for robust 14C-dated proxies for the intensity of prehistoric activity

Session organiser: John Meadows and Peer Kröger


In archaeology, researchers routinely compile large radiocarbon (14C) data sets from published literature, and search for patterns in the aggregated dates, which may correspond to diachronic fluctuations in the original population of dateable events. It is commonly assumed that the deposition of datable material varied over time in proportion to the human population, so that appropriately filtered calibrated 14C dates can reveal when, and how rapidly, human populations increased or decreased. Various methods have been developed to deal more realistically with random and systematic noise in aggregated 14C dates, but several challenges remain unanswered, including:

  • Data collection: how accurate and comprehensive are the large 14C data sets used in these studies? Are mistakes copied from one study or database to the next? Are results from academic research over-represented, in comparison to data generated in other sectors (or in other languages), and does this affect interpretations? Can data collection and validation be automated? Can a representative sample of available data be analysed more carefully to show where problems should be expected?
  • Data manipulation: how do perceived patterns in aggregated 14C dates change when supporting information is used to refine calibrated dates, e.g. using Bayesian chronological modelling?
  • Research questions: what diachronic patterns are already evident in other data types, and what chronological refinement of such patterns would be significant? What precision is expected, or useful, both in terms of temporal resolution and of the amplitude of fluctuations (and thus in rates of increase and decrease)? What precision is observable in simulated 14C data from statistically representative samples?

We invite researchers dealing with these topics and related issues to share their solutions and critiques, and hope to reach consensus over the types of questions which aggregated 14C data can now address.

Insights from reusing large prehistoric and interdisciplinary databases

Session organiser: Christian Sommer, Angela Bruch, Nicholas Conard, Christine Hertler, Miriam N. Haidle, Volker Hochschild, Zara Kanaeva, Andrew Kandel, and the ROCEEH Team


Recent trends in data science show ever-increasing amounts of digital data, the continued digitalization of legacy data, and an attitude towards fair-sharing. Together, these factors open up new potentials to study the biological and cultural evolution of early humans, including the environments in which they lived. The reuse of these various sources, however, remains challenging, because the data landscape appears fractured: Various providers maintain databases of different spatial and temporal scopes, levels of detail, and customized ontologies tailored to their specific research questions. Nevertheless, new computational approaches and interdisciplinary analyses merging information from archaeology, anthropology, paleoenvironment and geography have generated spectacular new insights in recent years.

This session addresses researchers who use one or multiple established databases to solve their own research questions related to the cultural, biological, environmental, and geographical expansions of early humans. We encourage contributors to submit case studies that feature a wide variety of research questions, their choice of analytical methods, and the results obtained. Furthermore, we provide room to discuss challenges and problems encountered in the handling of diverse and big datasets. These case studies will help researchers learn about the different solutions chosen by their colleagues; conversely, such case studies will also allow data providers to learn about the needs of their users. Furthermore, the discussion will help to identify current obstacles in the reuse of prehistoric big data.

Modelling expansions in South America: Integrating archaeology and linguistics

Session organiser: Fabrício Ferraz Gerardi and Bruno de Souza Barreto


In recent years, the interplay of archaeology and linguistics has significantly contributed to our understanding of population dispersals, contact patterns, and language dynamics in South America. This session proposal seeks to bring together researchers from both fields to explore the synergistic relationship between archaeology and linguistics in modelling expansions across South America. The session aims to foster interdisciplinary discussions and enhance our comprehension of the complex processes underlying population movements, cultural interactions, and language shifts that have shaped the South American landscape. Topics will focus on the potential of combining data from archaeology and linguistics, highlighting their collective insights into the dynamics of human migrations, language contact, and cultural transformations. The archaeological component will explore methodologies for studying population expansions, including the analysis of settlement patterns, material culture, chronological data and so on. Emphasis will be placed on how archaeological data can inform us about the timing, routes, and impacts of past population movements in South America. The linguistic component will delve into language families, proto-languages, language dispersal and contact, and language phylogenies in South America. Comparative linguistics and language shift analysis will be utilized to uncover the linguistic dynamics associated with population expansions and cultural interactions. By integrating archaeological and linguistic perspectives, participants should offer comprehensive understanding of the intricate processes underlying population movements. This integration will enable the analysis of expansion patterns, cultural interactions, and language dispersals by mapping data onto geographic and temporal frameworks. Overall, this session provides a platform for researchers to exchange knowledge, explore collaboration, and advance our understanding of South American history. By leveraging the strengths of archaeology and linguistics, we can shed light on the rich tapestry of population movements, cultural exchanges, and linguistic developments that have shaped the region.

History unleashed: Harnessing the knowledge stored in historical documents

Session organisers: Adam Izdebski, Carlo Cocozza


Historical documents offer an abundance of information on various aspects of past societies and their environmental contexts. Research areas such as historical demography, historical geography, economic history, history of agricultural production, and records of extreme environmental and climatic events are just a few examples of potential applications. In our session, we aim to explore the diverse applications of historical databases, the techniques for generating and organizing these databases, and the utilization of innovative technologies, including machine learning, to assist in the data collection process. We welcome contributions from researchers working in any field of historical research.

Palaeoclimatic and palaeoenvironmental databases: Exploring the dynamics of human-environmental systems

Session organiser: Achim Brauer


Palaeoclimatic and palaeoenvironmental databases are essential for elucidating the Earth’s climatic history, its environmental transformations, and their associations with the evolution of human populations and societies. These invaluable data sources offer insights into historical and prehistorical climate fluctuations, thus enabling scientists to enhance predictions regarding potential consequences for ecosystems and human communities. Materials and measurement techniques that serve as proxies for palaeoclimatic and palaeoenvironmental information encompass a wide array of natural archives, such as ice cores, tree rings, ocean sediments, coral reefs, speleothems, and lake sediments.

This session will concentrate on the development, management, and implementation of palaeoclimatic and palaeoenvironmental databases, in addition to the amalgamation of diverse data resources to further our comprehension of the Earth’s climatic and environmental chronicle across various spatial and temporal scales. Contributions that specifically investigate the interactions among human-environmental systems are especially encouraged.