Home: Scholarly Communications News: Data Management

Data Management Issues

There are many important issues to be addressed in terms of data capture, preservation, organization, and sharing.

Government and granting agencies have established guidelines and regulations for Data Management Plans (DMPs).

The library will be exploring our services and support for individual and organizational needs in the near future.

Data Management Plan

SXU Data Management Planning (Proposal)

WHY?

Scholars no longer consider information retrieval an outcome, but merely the start of personal information management. A modern researcher needs to be able to discover, critically analyze, synthesize, capture, organize, and share information. This information fluency involves all types of information: text, data, images, and multimedia.

The library should serve as a lead participant for the discovery, creation, handling, and sharing of information – both for individuals and for the entire organization.

INSTITUTIONAL DATA

The library is currently exploring enterprise-wide solutions for capturing and distributing faculty information (VIVO, Archives, Records Management) and for hosting unit-level documentation (training materials, best practices, manuals, etc) on the SharePoint platform. We have launched an Institutional Repository (IR) that will house and distribute some of our locally created intellectual property as Open Source material. For additional information on information and data stewardship see Handling Information and Data.

INDIVIDUAL DATA

For individual researchers, we are promoting the use of personal management tools such as Zotero (journal articles, web pages, images) and Outwit (links, files, text). We are also exploring social networks (diigo) for creating online communities. For more information on these Personal Knowledge Management (PKM) tools see "Citation and Document Management".

HANDLING DATA

The creation of sophisticated data handling capabilities will require both support for software (SPSS, SAS, GIS, etc.) and the acquisition of data sets (demographic census data, trends and projections data tracking social concerns, raw business data for active laboratory-based learning, and campus outcome and impact data for Return on Investment analyses.)

BIG DATA: Raw data sets must be manipulated in tools such as SPSS and SAS. See the Purdue University Data site and their collaborative Data Curation Profiles Toolkit for information on creating a data plan. Another good source of information about data handling is found at DataQ, a collaborative effort across various university research institutions.

SHARING OPEN DATA: DataBridge is a project intended to expand the life cycle of dark data, allowing researchers from around the country to submit their data after publishing their findings. The platform will serve as an archive for data sets and metadata, grouping them into clusters of information to make relevant data easier to find.

TRAINING IN EXPERIENTIAL DATA MANIPULATION

Data manipulation software and data itself can be very expensive and often have significant learning curves. Graduates will be expected to be familiar with these tools when they enter the workforce, and the university must provide hands-on experience with such systems. It is time for SXU to begin a program of obtaining these tools and providing high-level training within the curriculum.

Such a program will require time, thoughtful planning, and escalating expenses. The university may need outside funding to support the development of proactive learning labs and data repositories. The scale of the potential tools and data sets will require us to expand our program in phases. Our criteria for first areas of concentration should include (1) the immediate expectation by real world entities of graduates having experiential learning in our curriculum, (2) the availability of resources within our university to support key curriculum elements, (3) the impact-per-student of expenditures for software and data sets, and (4) the potential to discover outside funding to support such discipline-specific programs.

NEXT STEPS

The library is prepared to dedicate a portion of a professional librarian to collaborate on the design of such a Data Service program, and will be able to reallocate some collection funds to support some data sets.

Our next step is to meet with other units (both academic and operational support) in order to understand the current needs, possibilities, and resources that should be considered in such a university effort.

Some initial tools and needs have already been identified, but many more exist. Some areas may not even be on the radar at the present time, so in order to document the current situation, the campus should perform a data needs assessment and an environmental scan of Best Practices in other universities.

Two initial data sets to serve as examples are:

(1) SimplyAnalytics (was SimplyMap) … a tool that allows for the mining and mapping of data in the areas of U.S. census demographics, business buying behaviors, and health care statistics.

(2) COMPUSTAT/CRISP … a business tool that allows for real-time analysis of stock information in real-world scenarios.

*************************************************************************

One area of information research is Data Visualization: making raw data more understandable through presenting visceral concept spaces, pattern recognition, and projection of time-based trends.

An example of a powerful data visualization tool is AcademyScope - watch a video detailing the design and programming of the AcademyScope visual representation of subject navigation among books. Concept spaces are used to show semantic relationships and frequencies of terms.

Other examples of providing data in a visual way are:

GapMinder - demonstration of a dynamic ife expectancy chart and graph
Worldmapper - cartographic representations of various data sets
Emblematica - UIUC search platform of embedded images with metadata facets