Tuesday, May 23
Workshop attendees will be sent advance instructions on preparing for their registered session(s). Requirements differ by workshop and instructor. In general, be prepared to bring a fully-functional laptop with you (i.e.: not a tablet or Chromebook).
Collecting and Storing Data from Internet-Based Sources
Peter Smyth, UK Data Service
Many websites allow researchers and developers to download data from them using their Application Programming Interface (API). This data is often in formats that social scientists are unfamiliar with (e.g. JSON). Downloaded data can be processed immediately or stored in a database for later processing in a package like R or Stata. Data can be collected at regular intervals over a period of time, using the built-in functionality of the Windows or Linux operating systems.
This introductory workshop is aimed at anyone interested in collecting data from the internet via APIs.
Curation, Collaboration, and Coding – The Secret Sauce for Scholarship Support
Megan Potterbusch, Association of Research Libraries
Cynthia Hudson-Vitale, Washington University in St. Louis Libraries
This half-day workshop is an overview and hands-on introduction to the Open Science Framework and the SHARE data set, two tools that form a powerful combination for supporting scholarship and research locally as well as improving scientific integrity and allowing for new forms of meta-research.
Developed by the Center for Open Science, the Open Science Framework (OSF; http://osf.io) is a free, open source tool that works within the research workflow to allow for better management, curation, streamlining, and sharing of scholarly outputs. SHARE builds its free, open, data set (https://share.osf.io/) by gathering, cleaning, linking, and enhancing metadata that describe research activities and outputs—from data management plans and grant proposals to research data and code, to preprints, presentations, and journal articles.
In this workshop, participants will learn to use the OSF to develop embedded data stewardship and research management services for faculty. Attendees will also learn how to leverage and enhance SHARE data to improve their institutions’ understanding of the whole scholarship ecosystem happening on their campuses.
This workshop will be divided into two parts. First, attendees will learn strategies to provide curation and research services to the faculty workflow by operating in the OSF. Practical approaches to faculty collaborations and curation assistance throughout the research life cycle will be discussed. The second part will focus on harnessing the power of the SHARE data set to discover and act upon the research outputs of an institution or organization. This hands-on portion of the workshop will use IPython/Jupyter Notebooks to access the SHARE API and search across 129+ different providers and export and clean the metadata.
Participants are encouraged to bring laptops in order to follow along. No previous programming experience is necessary.
Dueling CAQDAS – Using Atlas.ti and NVivo for Qualitative Data Analysis
Mandy Swygart-Hobaugh, Georgia State University
Florio Arguillas, Cornell Institute for Social and Economic Research (CISER)
Many social scientists like to “get their hands dirty” by delving into deep analysis of qualitative data – be it discourse analysis, in-depth interviews, ethnographic observations, visual and textual media analysis, etc. Manually coding these data sources can become cumbersome and cluttered – and may even hinder drawing out the rich content in the data. Consequently, qualitative researchers are increasingly turning to computer-assisted qualitative data analysis software (CAQDAS) to facilitate their analyses. Through hands-on work with provided data, participants will explore ways to organize, analyze, and present qualitative research using both NVivo and Atlas.ti analysis softwares. The workshop will cover the following topics:
- Coding of text and multimedia sources
- Using Queries to explore and code data
- Organizing and classifying sources to facilitate comparative analyses across data characteristics (e.g. demographics)
- Data visualizations and reports
Note that workshop attendees will need to provide their own laptop running Windows or Windows virtual desktop (for Macs). The workshop leaders will contact attendees with instructions for downloading free trial versions of Atlas.ti and NVivo for installation prior to the workshop date.
This workshop is sponsored by the Qualitative Social Science and Humanities Data Interest Group (QSSHDIG).
Introduction to Mapping & QGIS
Megan Gall, DataScribe Consultants
We’re going to make some maps. Historically, there were substantial barriers to incorporating geographic information systems (GIS) into the social sciences. Originally used in the physical sciences, GIS is now well entrenched as a useful suite of analytic tools for all branches of social sciences and the list of relevant applications grows continually. Additionally, new and open source software remove many of the monetary barriers. This workshop will delve into QGIS, a powerful and free desktop GIS. We’ll cover topics designed to get new users acclimated to the technology and mapping on their own.
We will cover basic and intermediate GIS topics. Basic topics include general mapping concepts, data requirements, useful GIS data repositories, and how to load those data into QGIS. Intermediate topics will cover types of GIS data visualizations, data manipulation techniques, and basic analyses.
Preparing Qualitative Data for Sharing and Re-use
Louise Corti, UK Data Archive, University of Essex
Libby Bishop, UK Data Archive, University of Essex
Sebastian Karcher, Qualitative Data Repository, Syracuse University
This workshop is for researchers interested or actively engaged in the creation and management of qualitative research data, and looks at the steps required to prepare data for sharing and reuse. We will cover existing best practices and tools, looking at data preparation, ensuring that non-proprietary formats are used, and raw data are documented to capture as much context as possible. We pay attention to the design of consent forms, and methods of anonymisation and controlling access, highlighting strategies that researchers can use to share as much research information as possible ethically and legally.
Finally we show examples drawn from UK Data Service and the Qualitative Data Repository of how data can be published, the levels of access control required, and look at the impact of sharing data as a valued research output, and of course, a great long-lasting asset!
We track examples of successfully archived qualitative data as it makes its way through the data assessment, review, processing, curation, and publishing pipeline.
Curating for Reproducibility: Why and How to Review Data and Code
Florio Arguillas, Cornell University
Thu-Mai Christian, Odum Institute, UNC Chapel Hill
Sophia Lafferty-Hess, Odum Institute, UNC Chapel Hill
Limor Peer, Yale University
Developments in digital scholarship, advances in computational science, mandates for open data, and the reproducibility crisis require more attention to code as a research object. We consider activities that ensure that statistical and analytic claims about given data can be reproduced with that data, curating for reproducibility (CURE). This 3-hour workshop will teach participants practical strategies for curating research materials for reproducibility. The workshop will be based on the data quality review, a framework for helping ensure that research data are well documented and usable and that code executes properly and reproduces analytic results. The workshop will introduce three models for putting this framework into practice (the Institution for Social and Policy Studies (ISPS) at Yale University, the Cornell Institute for Social and Economic Research (CISER), and the Odum Institute for Research in Social Science at the University of North Carolina at Chapel Hill). Participants will learn about the basic components of the CURE workflow using examples and hands-on activities. The workshop will also demonstrate a tool that structures the CURE workflow.
Data Literacy for All, with R
Ryan Womack, Rutgers University
Introducing general audiences to their first hands-on data work often faces formidable barriers. New users typically must spend their time installing, configuring, and learning the programming conventions of specific software environments that may themselves present barriers of cost and compatibility. Importing and wrangling data into a form suitable for use is another barrier.
As data professionals, we can apply our skills to develop relatively painless introductions to data that focus on understanding the data itself and analytical concepts, instead of the mechanics of a program. We can customize and tailor our presentations to the needs of particular audiences by developing wrappers around data and functions that simplify their use, and we can develop techniques and interfaces that allow easy data exploration.
Using R, this workshop will explore 1) building packages for distributing data and functions; 2) using sample data and functions to illustrate basic data literacy concepts such as descriptive statistics, modeling, and visualization, while keeping the focus on meaning, not mechanics; and 3) building tools for interactive exploratory data analysis by end users. As open source software, R is easily available and can be locally distributed where internet access and computing resources are scarce.
International Activities in Research Data Management Education: Tools and Approaches
Helen Tibbo, School of Information and Library Science, UNC Chapel Hill
Nancy McGovern, MIT
Thu-Mai Christian, Odum Institute, UNC Chapel Hill
Jacob Carlson, University of Michigan
Merce Crosas, Harvard University
Robin Rice, University of Edinburgh
This workshop will present brief overviews of key international RDM education efforts with a synthesizing overview of progress in this area. Tibbo and Christian will report on “Research Data Management and Sharing,” the MOOC (Massively Open Online Course; https://www.coursera.org/learn/data-management) produced by the CRADLE project– (cradle.web.unc.edu) – and the University of Edinburgh’s MANTRA (datalib.edina.ac.uk/mantra) program. The MOOC is relevant to librarians, archivists, and other information professionals tasked with research data management and preservation as well as to researchers themselves. Rice will provide an update on MANTRA and RDM efforts at the University of Edinburgh, reflect on her experience with the Coursera MOOC, and discuss how this tool might be enhanced for librarians and especially researchers. McGovern will discuss her work with the Digital Preservation Management Workshop series with which she has been a driving force for over a decade and discuss lessons taken from digital preservation for RDM activities and training efforts. Crosas will discuss RDM work at Harvard University and Carlson will talk about how Data Curation Profiles can help with data management education.
These presentations will provide the audience with a starting point for breakout session topics that may include but are not limited to:
- How do you handle data training at your institution?
- What are your professional needs in RDM (education for librarians/archivists)?
- What lessons have you learned from working with research on their RDM needs?
You Can Too! Running a Successful Data Bootcamp for Novices
Ryan Clement, Middlebury College
Successful outreach on topics such as working with and managing research data can be challenging when faced with novice users. Participants in this workshop will learn about v 1.0 (2015) and 2.0 (2016) of a multi-day Data Bootcamp for novice users in the humanities and humanistic social sciences that was held at Middlebury College. This workshop covered topics such as managing, cleaning, and documenting data, as well as data visualization, mapping, and working with textual data. In addition to discussion about what worked for Middlebury, participants will work together to determine audience needs, learning objectives, and tools. Potential workshop plans will focus on active learning methods and free and/or open-source tools and data to increase accessibility. Participants will also be able to access and share workshop materials from an Open Science Framework project.