Eight International Research Funders Announce the Winners of the 2011 Digging into Data Challenge


OTTAWA, ONTARIO--(Marketwire - Jan. 3, 2012) - Eight international research funders from four countries jointly announced the winners of the second Digging into Data Challenge, a competition to promote innovative humanities and social science research using large-scale data analysis.

Fourteen teams representing Canada, the Netherlands, the United Kingdom and the United States have been awarded grants to investigate how computational techniques can be applied to "big data" to change the nature of humanities and social sciences research. Each team represents collaborations among scholars, scientists and librarians from leading universities worldwide.

"The Digging into Data Challenge is an international initiative that enables Canadian researchers to take advantage of the huge digital resources now available and to develop close partnerships with overseas universities," said Chad Gaffield, president of the Social Sciences and Humanities Research Council of Canada (SSHRC). "These exciting projects cross both disciplines and national borders; they lead to new insights into human thought and behaviour."

The first round of the Digging into Data Challenge, held in 2009, was sponsored by four international funders and led to breakthrough projects that received coverage in The New York Times, Nature, The Globe and Mail, and Times Higher Education. For the current round, there are eight sponsoring funders and a total of fourteen funded projects.

These projects cover a wide variety of topics that include using information retrieval techniques to investigate changes in Western music; using high resolution imaging to study the ancient Egyptian mummification process; using data-mining technology to shed light on the impacts of economic opportunity and spatial mobility on social structure; and using natural language processing to analyze large text archives in the study of human rights abuses.

"Initiatives and analysis of this sort were unimaginable before having access to today's information and communications technologies. Today, scholars can data mine millions of digital documents, gaining new insights into our world and culture," said Gisèle Yasmeen, SSHRC's vice-president, Research. "This research is truly international in its scope and is supported by eight funding councils."

The eight research funders are the Arts and Humanities Research Council (United Kingdom), the Economic and Social Research Council (United Kingdom), the Institute of Museum and Library Services (United States), the Joint Information Systems Committee (United Kingdom), the National Endowment for the Humanities (United States), the National Science Foundation (United States), the Netherlands Organisation for Scientific Research (Netherlands), and SSHRC.

Total project funding is approximately US$4.8 million. SSHRC's contribution of CAN$869,117 will support Canadian researchers from eight of the fourteen teams.

Additional information about the competition can be found at www.diggingintodata.org.

Digging into Data Challenge

Round Two (2011) Winners

Cascades, Islands, or Streams? Time, Topic, and Scholarly Activities in Humanities and Social Science Research

(Principal Investigators: Cassidy R. Sugimoto, Ying Ding, Staša Milojević, Indiana University, Bloomington, NSF; Mike Thelwall, University of Wolverhampton, AHRC/ESRC/JISC; Vincent Larivière, Université de Montréal, SSHRC.)

This project will examine topic lifecycles across heterogeneous corpora, including not only scholarly and scientific literature, but also social networks, blogs and other materials. While the growth of large-scale datasets has enabled examination within scientific datasets, there is little research that looks across datasets. The team will analyze the importance of various scholarly activities for creating, sustaining and propelling new knowledge; compare and triangulate the results of topic analysis methods; and develop transparent and accessible tools. This work should identify which scholarly activities are indicative of emerging areas and identify datasets that should no longer be marginalized, but built into understandings and measurements of scholarship.

ChartEx

(Principal Investigators: Robert C. Stacey, University of Washington, IMLS; Arno Knobbe, Leiden University, NWO; Sarah Rees Jones, University of York, AHRC/ESRC/JISC; Michael Gervers, University of Toronto, SSHRC. Additional participating institutions: University of Brighton, Columbia University.)

This project will develop new ways of exploring the full text content of digital historical records. The project will demonstrate its approach using medieval charters, which survive in abundance from the 12th to the 16th centuries and are one of the richest sources for studying the lives of people in the past.

Digging into Connected Repositories (DiggiCORE)

(Principal Investigators: Andreas Juffinger, The European Library Office, NWO; Zdenek Zdrahal, The Open University, AHRC/ESRC/JISC.)

This project will analyze a vast set of Open Access research publications using natural language processing and social network analysis methods to identify patterns in the behaviour of research communities, to recognize trends in research disciplines, to acquire new insights about the citation behaviours of researchers and to discover features that distinguish papers with high impact. This will enable the development of better methods for exploratory search and browsing in digital collections or new ways of evaluating research or the researcher's impact.

Digging by Debating

(Principal Investigators: Colin Allen and Katy Börner, Indiana University, Bloomington, NEH; Andrew Ravenscroft, University of East London, Chris Reed, University of Dundee; David Bourget, University of London, AHRC/ESRC/JISC.)

A project to develop and implement a multi-scale workbench called "InterDebates", with the goal of digging into data provided by hundreds of thousands-and eventually millions-of digitized books, bibliographic databases of journal articles, and comprehensive reference works written by experts. The team's hypotheses are: that detailed and identifiable arguments drive many aspects of research in the sciences and the humanities; that argumentative structures can be extracted from large datasets using a mixture of automated and social computing techniques; and that the availability of such analyses will enable innovative interdisciplinary research, and may also play a role in supporting better informed, critical debates among students and the general public.

Digging into Human Rights Violations: Anaphora Resolution and Emergent Witnesses

(Principal Investigators: Ben Miller, Georgia State University, NSF; Lu Xiao, The University of Western Ontario, SSHRC. Additional participating institutions: University of North Florida.)

This project will develop an automated reader for large text archives of human rights abuses that will reconstruct stories from fragments scattered across a collection, and an interface for navigating those stories. By improving on anaphora resolution techniques in natural language processing for the connection of pronouns to specific nouns, this system will help researchers and courts reveal witnesses and patterns contained in their own collections.

Digging into Metadata: Enhancing Social Science and Humanities Research

(Principal Investigators: Mick Khoo, Drexel University, IMLS; Diana Massam, University of Manchester, AHRC/ESRC/JISC. Additional participating institutions: University of Glamorgan.)

The project will automatically generate new forms of metadata tags from existing metadata records and associated resources that will support discovery across multiple repositories. The project will utilize four repositories that vary in size; domain; metadata creation method and workflow; and quality. PERTAINS, a tool developed by one of the partner schools, will be used to analyze the metadata records in each repository and then to generate Dewey Decimal Classification-based tags. Clustering algorithms will be used to generate an index of similarity and match between resources in different repositories. After conducting a search, the user will retrieve a list of resources from the different collections that have been tagged in similar ways. Visualization techniques will be used to display the results in ways that enhance the research process.

Electronic Locator of Vertical Interval Successions (ELVIS): The First Large Data-Driven Research Project on Musical Style

(Principal Investigators: Michael Scott Cuthbert, Massachusetts Institute of Technology, NEH; Frauke Jürgensen, University of Aberdeen, AHRC/ESRC/JISC; Julie E. Cumming, McGill University, SSHRC. Additional participating institutions: Yale University.)

This project studies changes in Western musical style from 1300 to 1900 using the digitized collections of several large music repositories. The team notes that in order to understand style change in Western polyphonic music, we need to be able to describe acceptable vertical sonorities (chords) and melodic motions in each period, and how they change over time. The project aims to do this for European polyphony from 1300 to 1900, using advanced music information retrieval techniques to study highly contrasting kinds of music that are nevertheless unified by common concepts of tonality, consonance versus dissonance, and voice leading.

An Epidemiology of Information: Data Mining the 1918 Influenza Pandemic

(Edward T. Ewing, Bernice L. Hausman, Bruce Pencek, and Narendran Ramakrishnan, Virginia Polytechnic Institute and State University, NEH; Gunther Eysenbach, University of Toronto, SSHRC.)

This project seeks to harness the power of data mining techniques with the interpretive analytics of the humanities and social sciences, in order to understand how newspapers shaped public opinion and represented authoritative knowledge during this deadly pandemic. This project makes use of the more than 100 newspaper titles from 1918 found in the Chronicling America collection at the United States Library of Congress and the Peel's Prairie Provinces collection at the University of Alberta Library. The application of algorithmic techniques enables the domain expert to systematically explore a broad repository of data and identify qualitative features of the pandemic in the small scale, as well as the genealogy of information flow in the large scale. This research can provide methods for understanding the spread of information and the flow of disease in other societies facing the threat of pandemics.

Imagery Lenses for Visualizing Text Corpora

(Principal Investigators: Katharine Coles, University of Utah, NEH; Min Chen, University of Oxford, AHRC/ESRC/JISC.)

This project explores new visualization techniques for use in large-scale linguistic and literary corpora using the collections of the British National Corpus and various smaller archives of poetry. The team will investigate whether or not advanced visualization techniques can provide an interface that enables humanities researchers to use their domain knowledge dynamically, while using the computational capability of computers. In particular, can data visualization help users make new observations and generate new hypotheses? The aim of this project is to answer the above methodological research question, and to create a set of new visualization tools for future scholarly research.

IMPACT Radiological Mummy Database

(Principal Investigators: Randall Thompson, Saint Luke's Mid America Heart Institute, NEH; Andrew Nelson, The University of Western Ontario, SSHRC. Additional participating institutions: Al Azhar Medical School, Cairo, Quinnipiac University, Canadian Museum of Civilization, University of Southern California, University of California, San Diego, Mount Sinai School of Medicine, South Coast Radiological Medical Group, Newport Diagnostic Center, University of California, Irvine, Wisconsin Heart Hospital.)

This project is designed to provide mummy and medical researchers with a large-scale comparative database of medical imaging of mummified human remains. This departure from a case-study model for mummy studies will drive the field towards a large-scale comparative and epidemiological paradigm. The Canadian team will be investigating the evisceration and excerebration components of the Egyptian mummification tradition, and the US teams will apply the database to a greatly expanded study of atherosclerosis in ancient Egyptian mummies, as part of the IMPACT (Internet-based Mummy Picture Archive Communication Technology) Ancient Health Research Group, and to the refinement of a novel system of diagnosis by consensus for mummified remains.

Integrated Social History Environment for Research (ISHER)-Digging into Social Unrest

(Principal Investigators: Dan Roth, University of Illinois, Urbana-Champaign, NSF; Antal van den Bosch, Tilburg University, NWO; Sophia Ananiadou, The University of Manchester, AHRC/ESRC/JISC. Additional participating institutions: International Institute of Social History.)

This project will develop an integrated environment using sophisticated text mining tools to facilitate knowledge discovery in social history research. It will provide social historians and social scientists with the means to detect and associate events, trends, people, organizations and other entities of specific interest to social historians.

Integrating Data Mining and Data Management Technologies for Scholarly Inquiry

(Principal Investigators: Ray R. Larson, University of California, Berkeley; Richard Marciano, University of North Carolina at Chapel Hill, IMLS; Paul B. Watry, University of Liverpool, AHRC/ESRC/JISC. Additional participating institutions: Internet Archive, JSTOR [Journal Storage].)

This project will integrate large-scale collections, including JSTOR and the books collections of the Internet Archive, stored and managed in a distributed preservation environment. It will also incorporate text mining and natural language processing software capable of generating dynamic links to related resources discussing the same persons, places and events. In this 17-month project, we go beyond basic analysis by providing a prototype system developed to provide expert system support to scholars in their work.

Mining Microdata: Economic Opportunity and Spatial Mobility in Britain, Canada and the United States, 1850-1911

(Principal Investigators: Evan Roberts, University of Minnesota, NSF; Kevin Schürer, University of Leicester, AHRC/ESRC/JISC; Kris E. Inwood, University of Guelph, SSHRC. Additional participating institutions: University of Alberta, Université de Montréal, University of Essex.)

This project will make use of novel data-mining technology to exploit one of the largest population databases in the world, a vast collection of harmonized 19th and early-20th century census microdata from Britain, Canada and the United States originally digitized for genealogical research. The goal is to shed light on the impact of economic opportunity and spatial mobility on social structure in Europe and North America.

Trading Consequences

(Principal Investigators: Ewan Klein, University of Edinburgh, AHRC/ESRC/JISC; Colin M. Coates, York University, SSHRC. Additional participating institutions: University of St Andrews.)

This project will examine the economic and environmental consequences of commodity trading during the 19th century. The project team will be using information extraction techniques to study large corpora of digitized documents from the 19th century. This innovative digital resource will allow historians to discover novel patterns and to explore new hypotheses, through both structured query and a variety of visualization tools.

The Eight Research Funders

Created in 1965 as an independent federal agency, the National Endowment for the Humanities supports learning in history, literature, philosophy, and other areas of the humanities. NEH grants enrich classroom learning, create and preserve knowledge, and bring ideas to life through public television, radio, new technologies, museum exhibitions, and programs in libraries and other community places. Additional information about the National Endowment for the Humanities and its grant programs is available on the Internet at www.neh.gov.

The Arts and Humanities Research Council (AHRC): Each year the AHRC provides approximately £112 million from the Government to support research and postgraduate study in the arts and humanities, from languages and law, archaeology and English literature to design and creative and performing arts. In any one year, the AHRC makes approximately 700 research awards and around 1,300 postgraduate awards. Awards are made after a rigorous peer review process, to ensure that only applications of the highest quality are funded. The quality and range of research supported by this investment of public funds not only provides social and cultural benefits but also contributes to the economic success of the UK.

The Economic and Social Research Council (ESRC) is the UK's largest organisation for funding research on economic and social issues. It supports independent, high quality research which has an impact on business, the public sector and the third sector. The ESRC's total budget for 2011/12 is £203 million. At any one time the ESRC supports over 4,000 researchers and postgraduate students in academic institutions and independent research institutes. More at www.esrc.ac.uk

The Institute of Museum and Library Services (IMLS) is the primary source of federal support for the nation's 123,000 libraries and 17,500 museums. The Institute's mission is to create strong libraries and museums that connect people to information and ideas. The Institute works at the national level and in coordination with state and local organizations to sustain heritage, culture, and knowledge; enhance learning and innovation; and support professional development. To learn more about the Institute, please visit www.imls.gov.

The Joint Information Systems Committee (JISC) is a joint committee of the U.K. further and higher education funding bodies and is responsible for supporting the innovative use of information and communication technology (ICT) to support learning, teaching, and research. It is best known for providing a U.K. national infrastructure network, a range of support, content, and advisory services, and a portfolio of high-quality resources. Information about JISC, its services, and programs can be found at www.jisc.ac.uk.

The National Science Foundation (NSF) is an independent federal agency that supports fundamental research and education across all fields of science and engineering. In fiscal year (FY) 2009, its budget is $9.5 billion, which includes $3.0 billion provided through the American Recovery and Reinvestment Act. NSF funds reach all 50 states through grants to over 1,900 universities and institutions. Each year, NSF receives about 44,400 competitive requests for funding, and makes over 11,500 new funding awards. NSF also awards over $400 million in professional and service contracts yearly. More information about NSF is available on the Internet at www.nsf.gov/.

The Netherlands Organisation for Scientific Research (NWO) funds thousands of top researchers at universities and institutes and steers the course of Dutch science by means of subsidies and research programmes.

The Social Sciences and Humanities Research Council of Canada (SSHRC) is an independent federal government agency that funds university-based research and graduate training through national peer-review competitions. SSHRC also partners with public and private sector organizations to focus research and aid the development of better policies and practices in key areas of Canada's social, cultural and economic life. More information about SSHRC is available on the Internet at www.sshrc-crsh.gc.ca/.

Contact Information:

Gail Zboch
Partnerships Portfolio, SSHRC
gail.zboch@sshrc-crsh.gc.ca
613-943-1148

Michael Adams
Communications, SSHRC
michael.adams@sshrc-crsh.gc.ca
613-944-1758