Wednesday, May 24
17:00-18:30 (5:00-6:30 pm)
The Commons, Spooner Hall
Archive of Data on Disability to Enable Policy and Research: Creating a Common Resource for Disability and Rehabilitation Stakeholders
Jai Holt, Alison Stroud
The Archive of Data on Disability to Enable Policy and research is a new ICPSR initiative to build a central repository of quantitative and qualitative data about disability that has been dispersed across disciplines. The mission of ADDEP is to improve and enable further research on disability for researchers, policymakers, and practitioners by acquiring, enhancing, preserving, and sharing data. This poster will display ADDEP’s newly launched website and available resources. Also described in the poster are ways to discover data available to download from ADDEP and how the data can be used to better understand and inform the implementation of major disability-related policies such as the Americans with Disabilities Act. Details about how user-friendly data exploration tools and other resources on the ADDEP website will help to break down barriers to research within the cross-disciplinary disability and rehabilitation research community will be highlighted.
Ernie Boyko, Simon Hodson
CODATA, an interdisciplinary scientific committee of the International Council for Science (ICSU), has many things in common with IASSIST. Established in 1966, CODATA promotes and encourages the compilation, evaluation, and dissemination of reliable data of importance to science and technology on a world-wide basis. This poster will outline the scope of CODATA activities with the aim of identifying areas of mutual interest with IASSIST and explore possible areas of collaboration.
A Complex Use Case – Documenting the Consumer Expediture Survey at BLS
Daniel Gillman, Reginald Noel, Taylor Wilson, Lucilla Tan, Evan Hubener, bryan rigg, Arcenis Rojas
The Consumer Expenditure Survey (CE) is a Bureau of Labor Statistics (BLS) program that measures how US families spend their money. These data are also input to the CPI. BLS selected DDI-3.2 to document CE, including the entire life-cycle.
CE is conducted as 2 separate surveys, Interview and Diary. The data are combined during processing and packaged in 2 ways, one for CE dissemination and the other for CPI. Changes in design occur every odd numbered year. Yearly estimates are created every 6 months, PUMS issued yearly, and data sent to CPI monthly. CE processing is divided into 4 phases: 1) – sample selection and collection; 2) – initial edit subsystem; 3) – estimation and edit subsystem, with data sent to CPI; and 4) – final edits, tables, microdata. Data are processed in packages by expenditure type.
A documentation system needs to handle all these features. For development, BLS is conducting a phased approach, adding complexity from phase to phase. The incremental systems are designed to establish that DDI and the Colectica system are sufficiently sophisticated to account for each feature of CE. This paper will go into detail about the particulars of the CE survey, describe progress made, and plans for the future.
Creating Data Citations in LaTex for Economists
Courtney Butler, Brett Currier
The Federal Reserve Bank of Kansas City built a LaTex file for acquired data so economists could copy and paste the data citations into their preferred word processing program, which is LaTex. Copy and paste citations exist for traditional academic articles and books from places like Citation Machine or Google Scholar, which provides researchers with the code they need for various citation styles and word processing programs, including LaTex. We could not identify a similar plugin for datasets. We reviewed all active contracts and open data sources for publication permissions, restrictions or limitations, post-termination rights, and specific data citation guidelines. Information was then compiled and made available on a private intranet site to avoid violation of non-disclosure agreements. Citation information was translated into LaTex scripting in Modified Chicago Style when specific citation requirements were not indicated by the Licensor. This poster will explain that process, provide a template for data citations and their LaTex scripting, and a LaTex file for publicly available data.
The Curating for Reproducibility (CURE) Consortium
Florio Arguillas, Thu-Mai Christian, Sophia Lafferty-Hess, Limor Peer
In July 2016, the Institution for Social and Policy Studies (ISPS) at Yale University, the Cornell Institute for Social and Economic Research (CISER), and the Odum Institute for Research in Social Science at the University of North Carolina at Chapel Hill formed the Curating for Reproducibility (CURE) Consortium. These academic institutions all maintain data archives that have been involved in implementing workflows that put into practice data quality review, a framework that includes research data curation and code review. This framework helps to ensure that research data are well documented and usable and that code executes properly and reproduces analytic results. The proposed poster will outline the goals of the consortium as well as provide examples of how these institutions have integrated data quality review into workflows, tools, and protocols.
Data Data Data! But Little to Work With
The poster presents a model of data (in)accessibility in institutions where lots of data is produced and (not) stored. The difficulty in accessing available data due to factors such as lack of expertise on the part of the data professionals in charge, and also that of the data user is explained. The solution to the “Data glut” is proffered through proper planning and management of data in institutions that generate or gather data, as well as adequate capacity building for staff who handle data, and for users of data that has been generated. A case study of adequate data management and training is pictorially shown in the poster.
DDI Standardised Data Using Stata and Nessta Publisher in Malawi
JOLLY PRINCE GONDWE, Chifundo Kanjala
MEIRU has been doing research on several studies and some of them are cooking and pneumonia, Demographic, HIV, TB, Nutrition and bio-repository for samples. It is the first Public Health organisation to standardise its data into a DDI standard using Stata and Nessta in Malawi. In this presentation I report on how data querying and documenting work is being achieved and the challenges we encounter in working with STATA and NESSTA Publisher in the process. We are able to produce good quality metadata for our studies compliant to the DDI codebook format using Stata and Nessta publisher thus making it accessible to other user. I highlight our current data documentation process and identify areas within it that needs improvement. I also give feedback to Nesstar Publisher developers on aspects that we are finding easy to work with, and those that are challenging and need to be reconsidered. The perspectives shared in this presentation will be of value to organisations considering to start working on their metadata to make it structured and DDI or other standard compliant as well as to the developers of STATA and NESSTA.
Developing Research Data Life Cycle Strategy: A Collaborative Approach at Fed Chicago
Research data are critical for researchers at the Federal Reserve System to conduct empirical analysis for monetary policy related work and long-term projects. However, the management of research data in Fed Chicago was more likely handled on an ad-hoc basis, lacking a systematic and consistent approach of planning, acquiring, processing, publishing, storing and preserving the data. In 2016, a Research Data Life Cycle Strategy (RDLCS) was developed collaboratively among data librarians, IT staff, and researchers at Fed Chicago. This poster will diagram the six stages required for successful management of research data particularly tailored to the Fed environment, and highlight the key elements undertaken by all stakeholders to address existing issues and challenges and optimize users’ experience with research data.
Documentation in the Middle: Active Phase Project Documentation for Inclusive and Effective Team-Based Research
Hailey Mooney, Jacob Carlson, Karen Downing, Lori Tschirhart
Documentation is an essential component of good data management and yet data service providers often struggle to provide effective support to researchers. There are materials available for creating or assisting researchers with documentation at the beginning and end of a project; from data management plans to documenting data for archival purposes. However, we don’t yet have a solid understanding of how research teams incorporate (or not) documentation into their everyday work. This poster reports on a project to investigate, analyze, and synthesize real and ideal documentation practices within research teams in order to develop a universal project manual documentation template. It is our contention that a ‘lab manual’ or ‘project organization protocol’ will enhance the effectiveness and efficiency of research teams, while creating an inclusive environment by making local practices and expectations clear to all team members regardless of previous research experience and disciplinary background. The goal of this project is to identify the basic considerations that any researcher from any discipline should consider for their local documentation in support of team-based research projects.
Finding a Data Sharing Solution: Connecting Journals to Harvard’s Dataverse
Harvard Dataverse Repository offers Journals several workflow options to enhance their data sharing and preservation experience: 1. Journals can create a customized dataverse that allows use of the journal publishing workflow; 2. Journals can utilize option 1 paired with reproducibility verification provided by ODUM Archives; 3. Journal systems can make use of our Integration API currently used by OJS and OSF for seamless data deposits; and 4. Journals can recommend that authors deposit data into Harvard’s Repository. Journal specific features include: Private URL for dataset review and coming soon, data file widgets that can be included within the published journal article.
5 Minute Metadata: Informative videos to meet the metadata novice in the middle
When searching online for information regarding metadata, like what it is and why it’s useful, it is hard for people new to the concept of metadata to find accurate information. Metadata related videos results are often heavily technical or business-oriented in nature, such as narrated PowerPoint slides or heavy with text. Occasionally videos have factual errors confusing descriptive and structural metadata concepts or present too much information to quickly for people to understand well.
The “Minute Metadata” videos are a new take on how to introduce metadata to people in a non-confronting way, whether they are a seasoned data expert or a completely new to the concept. These videos are a way to improve metadata literacy by meeting people half way, because they might have heard about metadata but are be unsure of what it is, like the differences between descriptive and structural metadata. These videos introduce metadata in a fun and light-hearted way and help convey information about metadata and expand communication surrounding it.
Improving discoverability of Statistics Canada microdata: a DDI case study
Increasingly, modern policy and academic research requires access not only to aggregate level data but also to data at the level of the individual person, business or household. Statistics Canada has taken great strides in eliminating barriers to accessing data. Nevertheless, data not used to its fullest possible potential represents a failure of relevance. The Data Liberation Initiative (DLI) is a partnership between post-secondary institutions and Statistics Canada for improving access to Canadian data resources. For numerous years, the program has been tagging its microdata files in the Data Documentation Initiative (DDI) format. An advantage of DDI is that it is machine readable and interoperable with other platforms. In order to enhance the discoverability of the program’s microdata collection, the DLI has developed a search platform using open source technologies, including SOLR and Drupal that capitalizes on its rich metadata holdings. This presentation will examine the evolving needs of the DLI community and describe the challenges and opportunities that were met in developing the search platform. The capability for academic researchers to be able to search survey metadata at this level of detail has increased discoverability of Statistics Canada survey data and reduce barriers to accessing Statistics Canada microdata.
Let’s meet in the middle: facilitating access to administrative data in the UK
The Administrative Data Research Network (ADRN) facilitates access to de-identified administrative data for researchers. Under a complex and dynamic data sharing legal framework in the UK, the Network is a partnership of UK Universities, government departments, national statistical authorities, funders and research centres and it aims to deliver a service enabling secure and lawful access to de-identified linked administrative data to researchers.
As one of the ‘front doors’ to the ADRN, the Administrative Data Service is liaising with data owners, researchers and experts in data linkage and data governance to facilitate access to administrative data. In addition to providing guidance on processes and an infrastructure addressing some of the concerns on information governance and data security through dedicated ‘secure environments’ as points of access. Quite often, we find ourselves in the ‘middle’ of these discussions, as we negotiate access and translate requirements and repurpose documentation to ensure the project resonates with a variety of agendas and priorities.
The presentation will provide an overview of recent work in the area and how we have dealt with challenges up to now. We will summarise work done in trying to streamline application processes for different data providers in different data domains in the UK (e.g. education, health, crime, benefits and labour market). We will talk about how ADRN has been working alongside government departments to design and implement streamlined approaches to administrative data access in the UK and how we are supporting researchers when they apply to access administrative data for their research in the areas of ethics, consent, legal pathways to access, methodology and data availability. And how it’s not just about data meeting in the middle, it’s primarily about people.
Metadata in the Middle
To participate in centralized data search and access, Illinois Data Bank (the research data repository for the University of Illinois at Urbana-Champaign) contributes and maintains records for datasets in the DataCite Metadata Store, using the EZID API through Purdue University. The organizational and technical hand-offs through the various layers can be complex to navigate.
On the way from research data producers to consumers, Illinois Data Bank metadata is formatted, stored, subsetted, reformatted, and passed through several organizations.
University Library <=> EZID at Purdue <=> DataCite <=> International DOI Foundation
If a researcher needs to correct metadata or adjust publication delay, or a curator needs to suppress a dataset while reviewing concerns, Illinois Data Bank propagates adjustments along the chain. Supporting researcher self-deposit, along with a slate of curator controls, exploits the breadth of the API and requires an understanding of the connections among components. The goal of this poster will be to present these intricacies, along with some of our technical strategies for using the EZID API, in way that implements features of the Illinois Data Bank.
New approaches to facilitate responsible access to sensitive urban data.
Andrew Gordon, Rebecca Rosen, Daniel Castellani, Daniela Hochfellner, Julia Lane
Improving government programs requires analysis of government administrative data. Providing access to these data to academic and public sector researchers is an important first step to robust analysis. At the same time, these data contain Personally Identifiable Information and so great care must be taken in obtaining, storing, and providing access to these data. The Data Facility at NYU’s Center for Urban Science and Progress (CUSP) is building on a long history of research on how to facilitate data curation, ingestion, storage, and controlled access in a safe and trustworthy environment. The poster describes how CUSP combines computer science, information science, and social science approaches which include (i) building a data model that accommodates sharing research data across disciplines, (ii) employing data curation and ingestion services so that data providers can confidently share their data with authorized researchers, (iii) converting data restrictions into concise, easy to understand, and searchable metadata to help researchers find appropriate data for their research, and (iv) capturing activity around datasets as contextual metadata so researchers can discover new data to complement their analyses.
Promoting Data Usage in SSJDA: Introducing Our Secondary Analysis Workshops
Izumi Mori, Natsuho Tomabechi, Satoshi Miwa
Social Science Japan Data Archive (SSJDA) has released microdata since 1998. While we initially held no more than 200 datasets being used by a maximum of 10 users per year, we currently hold over 1900 deposited datasets, with a data usage count of approximately 2900 per year. One of our major initiatives in promoting such data use includes Secondary Analysis Workshops, which are held to encourage researchers and graduate students in social sciences to make the best use of the survey data kept in our archive. We seek participants and research themes from all over Japan and analyze the target datasets every year. Researchers from depository institutions who are knowledgeable about the data serve as advisors for the workshops. SSJDA staff also support the workshops as they provide expertise in social research and quantitative data analysis. Through these efforts, participants are able to work together to pursue their own research agenda, as they receive advice on the characteristics of the data as well as on choosing methodologies. The number of participants and research themes for the workshops have been increasing each year, suggesting that such initiatives are highly regarded by Japanese researchers and graduate students in social sciences.
Research data management and academic institutions: a scoping review
Leanne Trimble, Dylanne Dearborn, Laure Perrier, Ana Patricia Ayala, Erik Blondel, Tim Kenny, David Lightfoot, Heather MacDonald, Mindy Thuna
This poster will describe the results of a scoping review undertaken at the University of Toronto, Carleton University, and the University of North Texas Health Science Center in 2016-17. The purpose of this study is to describe the volume, topics, and methodological nature of the existing literature on research data management in academic institutions. The specific objectives of the scoping review include:
- to complete a systematic search of the literature to identify studies on research data management across all disciplines in academic institutions;
- to identify what research questions and topic areas have been studied in research data management related to academic institutions; and
- to document what research designs have been used to study these topics.
This poster will outline the analysis of the identified literature, and describe the results obtained from the scoping review.
Social science data archive business models: A historical analysis of change over time
Kristin Eschenfelder, Kalpana Shankar, Allison Langham, Rachel Williams
The sustainability of data archives is of growing concern, and recent reports have raised questions about possible alternative business models for data archives. This study will provide a clearer understanding of how and why data archives made changes in business models from the 1970s to the early 2000s in the past in response to evolving conditions. Business models encompass financial structures such as revenue streams and costs, but also relationships (contractual, partnerships etc.), mission decisions about who to serve, and collections decisions about what to maintain.
This poster is part of a larger project about how social science data archives have adapted over long periods of time and to variety of challenges.
We will include data on changes in business models at four prominent and long-lived social science data archives, ICPSR at University of Michigan, the UKDA, part of the UK Data Service at University of Essex, the LIS Cross National Data Center in Luxembourg, and EDINA at the University of Edinburgh. Our data include historical institutional documents and interviews with current and past staff.
The State of Data Curation in ARL Libraries
Cynthia Hudson-Vitale, Lisa Johnston, Wendy Kozlowski, Heidi Imker, Jacob Carlson, Robert Olendorf, Claire Stewart
The Data Curation Network surveyed members of the Association of Research Libraries (ARL) on their Data Curation Activities and Infrastructure as part of the ARL SPEC Kit program in January 2017. The openly accessible results of the study (link forthcoming) demonstrates the current state of data curation services in ARL institutions by addressing the current infrastructure (policy and technical) at ARL member institutions for data curation, treatment activities, the current level of demand for data curation services, and the frequency for how often specialized curatorial actions are taken. This poster dives deeper into the qualitative responses and analyzes the trends and challenges that institutions are currently facing when providing data curation services. As libraries seek to define their mission and service levels in support of data curation activities, having an understanding of the challenges that other institutions face in supporting this effort will be essential. Finally the poster will describe how the current partner institutions of the Data Curation Network will use the results of this survey to gain a more extensive understanding of the curation ecosystem beyond ARL institutions.
Store it in a cool dry place – processing and long-term preservation of research data
Tuomas J. Alaterä
One crucial component of making research data accessible and reusable is preserving it properly. We know that the recipe is heavy on metadata. But there are other ingredients too, and even the ripest data need to be carefully prepared for preservation. Furthermore, without the right tools and a cool dry place for storage the mission is in jeopardy from the beginning.
This poster highlights the recent work done at the Finnish Social Science Data Archive. In addition to our institutional repository we have been preparing our data collection for long-term preservation in a systematic and sustainable way by taking advantage of an emerging national long-term preservation solution.
There are four major, partly parallel, areas for development:
1) Choosing sustainable file formats, migrating existing content, and updating data processing policies and software accordingly.
2) Producing software for harvesting rich metadata and wrapping it with the data and contextual files into a METS container for transfer.
3) Influencing to the adoption of national standards and services.
4) Training the staff and administrative tasks
Better data management, increased trustworthiness and automated processes should allow us to allocate more human resources to other critical software development and data services. Our effort focuses on traditional social science data; data matrices, code and textual materials. However, the principles accepted can be adopted by other disciplines as well, given that the formats are the same. The work has been carried out in collaboration with the National Digital Library Initiative.
Switching from field work using ODK powered electronic data collection to data documentation in DDI: A junior data documentation officer’s initial impressions of DDI codebook, Malawi Epidemiology Interventions and Research Unit (MEIRU)
Themba masangulusko Chirwa, Chifundo Kanjala, Dominic Nzundah
In this paper, I give my perspective of how our organisation started using metadata standards to support data management and data sharing. MEIRU runs a health research programme encompassing a rural site in northern Malawi and an urban site in the capital city of Malawi Lilongwe. It has a rich collection of longitudinal data dating to as early as 1979. Work is now underway to convert the vast unstructured documentation into DDI codebook format for data sharing with researchers outside the project. I relate my education background and prior work experience to my current work as a metadata officer. I identify parallels and differences between my current and former jobs and highlight areas where training and closer supervision are required to strengthen my capacity. I finally attempt to identify opportunities for capturing metadata during the field work phase to reduce confusion down the line when the data are being prepared for sharing. The perspectives shared here could be of use to researchers working on projects similar to MEIRU and also to DDI developers who will see how we are implementing the specification in our settings. Am holding a Malawi School Certificate of Education (MSCE), Certificate Computing and Information Systems (CIS)
Using backward design to create research data management professional development for information professionals
Abigail Goben, Megan Sapp Nelson
This poster details the design process that was used to develop the Association of College and Research Libraries “Building Your Research Data Management Toolkit: Integrating RDM into Your Liaison Work” road show. Starting with the development of learning objectives, and highlighting the multiple assessments that are offered prior to the road show experience, during the road show itself, and follows up the road show at the one month and six month post- show mark. The poster then shows the links between the learning objectives, assessments, and learning activities developed to assist learners to meet the learning objectives.
Using Data to Make Sense of Data: The Case of Video Records of Practice in Education
Researchers and teacher educators use video records of practice documenting classroom activity to study and improve upon teaching across grade levels and subject areas. Their usage of video records of practice is often accompanied by the use of supplemental data, such as school/classroom demographics, seating charts, lesson plans, and interviews, to achieve research or teaching aims. Educational researchers use these data as case studies, to test research questions and framework, and to develop research protocols. Teacher educators use the data as teaching exemplars, to allow pre-service teachers to practice and evaluate teaching methods, and reflect upon pedagogy. This poster will evaluate patterns in the use of supplemental data usage by researchers, teacher educators, and those who use video records of practice for research and teaching depending on the purpose of that data reuse. The results of this analysis will provide a baseline for how and what supplemental data will meet the research and/or teaching needs of schools of education. The findings also have implications for repositories’ data collection strategies and how best to make video records of practice available to these designated communities.