Proposal preview

Interactive economic history workshop: usable systems for diverse data

Online systems make it possible to take on new kinds of historical projects and make new kinds of findings. This workshop is designed to engage WEHC participants with approaches for using heterogeneous data types or sources. The data can include images, maps, category systems, and relations in networks. Demonstrations will show interactivity, visualizations, and the data-science mode of cultivating the data rather than testing a single hypothesis.

For example, an online platform with international occupation information can support unified category systems and support imputations and inferences across countries (Hoekstra et al., 2016). On a wiki, historians and enthusiasts can gradually discuss and infer in an open-source auditable way whether nineteenth-century patents in different countries were by the same inventor (Meyer, 2016). An enriched timeline system can draw from dated events in Wikipedia, potentially enabling discoveries, hypothesis testing, and error correction (Stauber’s historiography.io). These approaches support large multidisciplinary historical projects by tolerating some ambiguity and uncertainty (Guldi and Armitage, 2014).

The session will begin with short talks in which presenters introduce their systems, projects, methods, or tools briefly. Then the presenters and audience will break out into groups organized around areas of interest. The groups will rejoin to share observations at the end. This session format draws from interactive and inclusive practices used in the “unconferences” at THATCamp (The Humanities and Technology Camp), Wiki conferences, Open and User Innovation conferences, and the Digital Humanities Summer Institute (DHSI). Those venues generate lively discussion, rapid informal feedback, practical lessons on sources and methods, and new recognition and support for ongoing long term projects. Presenters are invited to offer some small exercise or task to the people in the audience, such as querying linked data or editing a wiki, since new technologies and approaches may be easier to show and understand from demonstrations than from a written paper.

We have an open call (still, in May 2018) for presenters who develop or use online infrastructures for cooperative research. We expect participation from digital geographers, for example, and experts on platforms such as the Dataverse Project (dataverse.org); Wikimedia’s Wikidata site; repositories of images, documents, digitization, and transcriptions such as Wikisource; and linked-data platforms such as Dive+ (http://diveplus.frontwise.com) and grlc (http://grlc.io). Economic historians can draw from these rapidly growing databases and contribute back to them, indirectly collaborating with computer scientists, historians, archivists, and other social scientists. We welcome relevant experts at the edges of economic history including data scientists, geographers, and graphic designers, and new or emerging scholars. The organizers welcome submissions of abstracts by email with the subject “Interactive economic history workshop”.

References

Guldi, Jo, and David Armitage. 2014. The History Manifesto. Cambridge: Cambridge University Press.

Meyer, Peter B. 2016. Linking records of early aeronautical experimenters across data sets. Social Science History Conference, November 2016. (http://aero.referata.com/w/images/Record_linking_aero_inventors_1d.pdf)

Rinke Hoekstra, Albert Meroño-Peñuela, Kathrin Dentler, Auke Rijpma, Richard Zijdeman, Ivo Zandhuis. 2016. An Ecosystem for Linked Humanities Data. Proceedings of the 1st Workshop on Humanities in the Semantic Web co-located with 13th ESWC Conference 2016. 85–96. http://ceur-ws.org/Vol-1608/#paper-11

Organizer(s)

  • Peter B Meyer U.S. Bureau of Labor Statistics meyer.peter@bls.gov USA
  • Ellan F Spero École polytechnique fédérale de Lausanne (EPFL) efs8_at_mit.edu Switzerland/USA
  • Richard L Zijdeman International Institute of Social History and University of Stirling richard.zijdeman@iisg.nl Netherlands

Session members

  • Guillaume Daudin , Université Paris-Dauphine
  • Paul Girard, Sciences Po médialab
  • Béatrice Dedinger, Sciences Po, Centre d’histoire
  • Ruben Schalk, Utrecht University
  • Keti Lelo, Università Roma Tre
  • Aleksandra Dul, Jagiellonian University
  • Auke Rijpma, Utrecht University
  • Peter Meyer, U.S. Bureau of Labor Statistics
  • Richard Zijdeman, International Institute of Social History
  • Patrick Manning, University of Pittsburgh

Discussant(s)

  • Ellan F Spero École polytechnique fédérale de Lausanne (EPFL)

This panel has Call for Papers open.
If you are interested in participating, please contact the panel organizer(s) to submit a proposal.

  • Peter B Meyer, U.S. Bureau of Labor Statistics, meyer.peter@bls.gov, USA
  • Ellan F Spero, École polytechnique fédérale de Lausanne (EPFL), efs8_at_mit.edu, Switzerland/USA
  • Richard L Zijdeman, International Institute of Social History and University of Stirling, richard.zijdeman@iisg.nl, Netherlands

Papers

Panel abstract

This workshop will show approaches to heterogeneous data types or sources. The data can include images, maps, category systems, and relations in networks. Demonstrations will show interactivity, visualizations, and the data-science mode of curating data, not testing a hypothesis. Following brief presentations, we will join breakout groups clustered thematically (GIS, networks, wikis etc.) for focused discussion.

1st half

The TOFLIT18 datascape of French international trade

Paul Girard, Guillaume Daudin

To study the transformations of the French economy in the long eighteenth century, we created both a dataset and an interactive data visualisation tool. To transform the transcriptions of 18th century French trade archives into a research tool we built a Information System which comprises a data versioning system, a graph database and a web application which allow researchers to widen their understanding of 18th century French international trade through both quantitative and qualitative analysis (http://toflit18.medialab.sciences-po.fr/). We will present the main concepts and visualization means of the TOFLIT18 datascape which can then be mobilized in a hands-on session. Participants will explore the large TOFLIT18 database of French trade flows between 1714 and 1821 by product and partners to gain new insights on issues such as the economic life and representations of eighteenth-century French consumers, producers and administrators and how they were transformed throughout the century.

To study the transformations of the French economy in the long eighteenth century, we created both a dataset and an interactive data visualisation tool. To transform the transcriptions of 18th century French trade archives into a research tool we built a Information System which comprises a data versioning system, a graph database and a web application which allow researchers to widen their understanding of 18th century French international trade through both quantitative and qualitative analysis (http://toflit18.medialab.sciences-po.fr/). We will present the main concepts and visualization means of the TOFLIT18 datascape which can then be mobilized in a hands-on session. Participants will explore the large TOFLIT18 database of French trade flows between 1714 and 1821 by product and partners to gain new insights on issues such as the economic life and representations of eighteenth-century French consumers, producers and administrators and how they were transformed throughout the century.

The RICardo Project on Trade between Nations from c. 1800 to 1938

Paul Girard, Béatrice Dedinger

The RICardo website (http://ricardo.medialab.sciences-po.fr) provides interactive data visualizations to explore 19th century World International Trade. This exploratory data analysis tool aims at letting scholars discover the richness but complexity of this dataset by providing : 1- a documentation under the form of an interactive data visualization tool which reveals the heterogeneity of the dataset that compiles archives from different sources through a century; 2- a progressive exploration path from the more aggregated to the most precise view: world total trade, specific country bilateral trade, pair of trade partners mirror flows discrepancies; 3- a custom graphic semiology which emphasizes the data uncertainty of the dataset. RICardo is meant for studying and discovering the history of trade and trade globalization at three level of details and with the possibility to focus on some specific country or areas by only using a web browser.

The RICardo website (http://ricardo.medialab.sciences-po.fr) provides interactive data visualizations to explore 19th century World International Trade. This exploratory data analysis tool aims at letting scholars discover the richness but complexity of this dataset by providing : 1- a documentation under the form of an interactive data visualization tool which reveals the heterogeneity of the dataset that compiles archives from different sources through a century; 2- a progressive exploration path from the more aggregated to the most precise view: world total trade, specific country bilateral trade, pair of trade partners mirror flows discrepancies; 3- a custom graphic semiology which emphasizes the data uncertainty of the dataset. RICardo is meant for studying and discovering the history of trade and trade globalization at three level of details and with the possibility to focus on some specific country or areas by only using a web browser.

A Quick Network Approach to Historical Data

Aleksandra Dul

This paper presents a technical approach to creating network data from historical observations of relationships, using user-friendly techniques and giving visual results. The data has witnesses to marriages and associated relationships within the male population of a 19th century Polish parish, based on numerous sources. To build a database suited to further processing with graph modeling software, we’ve constructed a linked double database, one including raw source material, and one combining the data in real time, using MS Excel. The data is then processed with Gephi, a free, open-source program for network analysis and visualization. The output can be combined with other kinds of data, i.e., GIS maps, allowing more detailed analysis that goes along with further opportunities for visualization.

This paper presents a technical approach to creating network data from historical observations of relationships, using user-friendly techniques and giving visual results. The data has witnesses to marriages and associated relationships within the male population of a 19th century Polish parish, based on numerous sources. To build a database suited to further processing with graph modeling software, we’ve constructed a linked double database, one including raw source material, and one combining the data in real time, using MS Excel. The data is then processed with Gephi, a free, open-source program for network analysis and visualization. The output can be combined with other kinds of data, i.e., GIS maps, allowing more detailed analysis that goes along with further opportunities for visualization.

Datalegend: converting and linking statistical datasets to a cloud of interconnected historical datasets

Auke Rijpma, Ruben Schalk, Richard Zijdeman

Curating and publishing of socio-economic historical datasets tends to focus on large collections, sometimes with nonstandard or insufficient metadata. Effective integration of data sets requires expertise, and many smaller datasets remain isolated. In this session we will present the datalegend platform, which addresses the long tail of research data by catering for the needs of individual scholars. Datalegend enables researchers to publish their (small) datasets, link them to existing vocabularies and other datasets, and thereby contribute to a growing collection of interlinked historical datasets. By linking datasets, vocabularies, and queries, datalegend makes it easier to replicate and compare findings, and to address new research questions. You could augment your data with georeferences, census records, occupational stratification schemes, and national statistics. We will present the architecture of datalegend; its core vocabularies and data; and COW, an interactive, user supportive mapping generator and an RDF converter.

Curating and publishing of socio-economic historical datasets tends to focus on large collections, sometimes with nonstandard or insufficient metadata. Effective integration of data sets requires expertise, and many smaller datasets remain isolated. In this session we will present the datalegend platform, which addresses the long tail of research data by catering for the needs of individual scholars. Datalegend enables researchers to publish their (small) datasets, link them to existing vocabularies and other datasets, and thereby contribute to a growing collection of interlinked historical datasets. By linking datasets, vocabularies, and queries, datalegend makes it easier to replicate and compare findings, and to address new research questions. You could augment your data with georeferences, census records, occupational stratification schemes, and national statistics. We will present the architecture of datalegend; its core vocabularies and data; and COW, an interactive, user supportive mapping generator and an RDF converter.

The Web GIS of Rome in the 18th and 19th centuries

Keti Lelo

This presentation will illustrate the on-going project “The Web GIS of Rome in the 18th and 19th centuries” focusing on methodological and efficiency aspects. Geographical information systems (GIS) enable researchers to analyze, compare and share huge amounts of information about cities, making a “spatial” approach to urban history possible. This on-line Web GIS platform, “Rome in the 18th Century” (http://croma.uniroma3.it/?contenuto=accedi-allhgis), is based on the cartographic work by G.B. Nolli of 1748. It includes descriptive and iconographic information from different sources. One can query the system about noteworthy places (palaces, public buildings, churches, villas, streets and squares, archaeology) and visualize the historical map, the current situation in that location (google map), the site description and sometimes an 18th century view. Detailed descriptions are provided for demolished buildings and for archaeological sites. The system interacts with the GIS System of the Archaeological Sopraintendenza, making it possible to upload additional layers. I’ll show...

This presentation will illustrate the on-going project “The Web GIS of Rome in the 18th and 19th centuries” focusing on methodological and efficiency aspects. Geographical information systems (GIS) enable researchers to analyze, compare and share huge amounts of information about cities, making a “spatial” approach to urban history possible. This on-line Web GIS platform, “Rome in the 18th Century” (http://croma.uniroma3.it/?contenuto=accedi-allhgis), is based on the cartographic work by G.B. Nolli of 1748. It includes descriptive and iconographic information from different sources. One can query the system about noteworthy places (palaces, public buildings, churches, villas, streets and squares, archaeology) and visualize the historical map, the current situation in that location (google map), the site description and sometimes an 18th century view. Detailed descriptions are provided for demolished buildings and for archaeological sites. The system interacts with the GIS System of the Archaeological Sopraintendenza, making it possible to upload additional layers. I’ll show slides about the ongoing effort to integrate 19th century documentary sources into the GIS. Cartographic, census, fiscal, and other administrative documents are available, but with variability and imperfect reliability. I would particularly like to receive feedback about the feasibility of some processes of simplification, information standardisation, and data analysis, for synthesis and cartographic representation.

CHIA’s World-Historical Dataverse: A Historical Repository

Patrick Manning

The World-Historical Dataverse, maintained by the Collaborative for Historical Information and Analysis (CHIA) at the University of Pittsburgh’s World History Center, is a general repository for digital historical documents. The repository, held at the Harvard Dataverse Network, was created to hold a wide range of historical data, in the hope of assembling and aggregating data, to permit analysis of global patterns since 1500. Data are open-source and may be consulted freely; additional datasets may be contributed via the CHIA Open Data Submission link. Datasets must meet requirements for accuracy and historical metadata. Topics of datasets held include regional population data, commodity flows, health statistics, religion, taxation, transportation, and more. Spatial data may be linked to the World-Historical Gazetteer project in process. CHIA maintains information on the construction, operation, and maintenance of a historical repository.

The World-Historical Dataverse, maintained by the Collaborative for Historical Information and Analysis (CHIA) at the University of Pittsburgh’s World History Center, is a general repository for digital historical documents. The repository, held at the Harvard Dataverse Network, was created to hold a wide range of historical data, in the hope of assembling and aggregating data, to permit analysis of global patterns since 1500. Data are open-source and may be consulted freely; additional datasets may be contributed via the CHIA Open Data Submission link. Datasets must meet requirements for accuracy and historical metadata. Topics of datasets held include regional population data, commodity flows, health statistics, religion, taxation, transportation, and more. Spatial data may be linked to the World-Historical Gazetteer project in process. CHIA maintains information on the construction, operation, and maintenance of a historical repository.

Making Social Science More Reproducible by Encapsulating Access to Linked Data

Richard Zijdeman, Albert Meroño-Peñuela, Ashkan Ashkpour, Rinke Hoekstra

Published social science history papers generally do not contain enough information for a complete replication of the study, although replication is called for by a primary principle of the scientific method. Typically, a study does not offer a link to the actual data on which its findings are based. A link may cite a dataset but not point to the exact data needed for reliable replication. Data citations are challenging to create and use when the data is volatile, or draws multiple sources. Referring to “just the data” is not enough: it is necessary to point to the sources and queries from these sources that generated the data. Without this information, regressions, correlations, and visualizations are no longer reproducible. In this paper, we propose grlc, a method that enables the curation, versioning, publishing, sharing and replication of queries over collections of research data. grlc makes the results that answer historical...

Published social science history papers generally do not contain enough information for a complete replication of the study, although replication is called for by a primary principle of the scientific method. Typically, a study does not offer a link to the actual data on which its findings are based. A link may cite a dataset but not point to the exact data needed for reliable replication. Data citations are challenging to create and use when the data is volatile, or draws multiple sources. Referring to “just the data” is not enough: it is necessary to point to the sources and queries from these sources that generated the data. Without this information, regressions, correlations, and visualizations are no longer reproducible. In this paper, we propose grlc, a method that enables the curation, versioning, publishing, sharing and replication of queries over collections of research data. grlc makes the results that answer historical questions actionable via a single, unique web address (URL). Sharing these queries, along with their provenance and the meta-data needed for their execution, enables reuse and facilitates the reproducibility of studies. We illustrate with three use cases of grlc: integrating diverse access methods and sources of the Dutch historical censuses (1795-1971); generating data actionable links over social history data in the CLARIAH/Datalegend project; and the systematic retrieval of classification systems on the Web, reusing codes of the Linked International Classification of Religions (LICR).

Linking records of early aeronautics and aviation across data sets

Peter B. Meyer

I'll show a wiki which acts as a database system. It has links between data records that cover patents, inventors, clubs, firms, exhibitions, and conferences related to aeronautics and aviation globally from 1800 to 1916. Each wiki page has text with links, footnotes, and perhaps images to discuss its topic. Most pages also have structured data, a row from a larger table which can be queried to create reports and charts on other wiki pages. That is, a page about a patent will discuss the patent in historical language, and also display a row from a patents table which can be edited on the spot. The period under study extends from a time when a flying machine was a dreamy idea, to its transition into a hobby, a new science, a string of inventions, and then a startup industry. The data are kept on a wiki at http://aero.referata.com, where any...

I'll show a wiki which acts as a database system. It has links between data records that cover patents, inventors, clubs, firms, exhibitions, and conferences related to aeronautics and aviation globally from 1800 to 1916. Each wiki page has text with links, footnotes, and perhaps images to discuss its topic. Most pages also have structured data, a row from a larger table which can be queried to create reports and charts on other wiki pages. That is, a page about a patent will discuss the patent in historical language, and also display a row from a patents table which can be edited on the spot. The period under study extends from a time when a flying machine was a dreamy idea, to its transition into a hobby, a new science, a string of inventions, and then a startup industry. The data are kept on a wiki at http://aero.referata.com, where any number of authorized users could edit it, their edits are tracked, and new changes are visible immediately to the others. Over 14,000 patents have a page, and tens of thousands of other records on publications, clubs, firms, experimenters, authors, exhibitions, conferences, letters between experimenters, and other information are on the wiki. The wiki approach enables both detailed contextual historical accounts and statistical data summaries to be supported, without losing track of other perspectives. Historical ambiguities, such as uncertainty about names and identities, and overlapping changing patent classifications, can be managed well on such a system, and the system offers statistical measures from the data as it evolves and grows. A wiki is a web site that shows data records and allows users to edit the records directly from their web browser. Such systems are auditable in the sense that each past edit to a record is remembered, and it is straightforward to check who made an edit and when, and to compare the versions before and after the change. In a wiki, the records are easily hyperlinked to one another, and properties of these links are also usable data, for example, a link might be from a record about a patent to one of several category systems to classify what the patent is about. Each item can be classified in a number of ways, and its network relations to other items recorded and developed with hyperlinks. This helps historical conclusions be grounded to evidence.

2nd half

Second half of the interactive economic history workshop

Presenters and audience together

Having heard short presentations in the first half, in the second half presenters and audience will break out into groups organized around areas of interest. We may also have presenters who were identified after the program was completed. The groups will rejoin to share observations at the end.

Having heard short presentations in the first half, in the second half presenters and audience will break out into groups organized around areas of interest. We may also have presenters who were identified after the program was completed. The groups will rejoin to share observations at the end.

Leave a Reply

You must be logged in to post a comment.