Home: Scholarly Research Process Outline: Handling Information and Data

Capturing Your Information

LOCATING DATA SETS

The library provides access to the SimplyAnalytics (was SimplyMap) tool that allows researchers to mine U.S. census data and sets of business data in order to generate maps or build data sets which can be exported into other tools such as Excel, SPSS, and SAS for further manipulation. (Simply Map Information and help video page.)

Researchers can search for data sets using Databib - a searchable catalog / registry / directory / bibliography of research data repositories. These data sets must be imported (in some cases after format manipulation) into data tools such as SPSS and SAS. For additional information about Data support see the Purdue University Data site.

Another new initiative is Google's Dataset Search which attempts to harvest and federate scientific datasets.

DOE Scientific Research Data - quickly see what data are available, where data collections reside, and go directly to the data; users can peruse recently added or revised content, view the hundreds of datasets, data streams, and data collections by title, display data from more than 50 subject categories, and select content by sponsoring or originating research organizations.

Open Science Framework (OSF) is another platform for storing and sharing research data.

Zenodo is a platform created by CERN as an open dependable home for the long-tail of science, enabling researchers to share and preserve any research outputs in any size, any format and from any science.

Zanran Numerical Data Search is a search engine for data and statistics. Zanran’s strength is in finding graphs, charts and tables on the Internet, which distinguishes it from other search engines.

Another way to uncover data sets is to enter a web search with the term "site:statista.com" .... the example format would be: alcoholic beverages site:statista.com

CAPTURING FULL TEXT MATERIAL FOR MANIPULATION

One tool used to capture published data is the TDM service created by CrossRef. This Text and Data Mining (TDM) tool allows a researcher to download metadata from various publishers sites. This information would then allow you to download the PDF fulltext materials. The final step is to convert the PDF into XML markup for manipulation.

CAPTURING DATA FROM WEB PAGES

Outwit is a software product that allows you to automatically harvest (capture and download) embedded links, associated documents and data files, and to scrape (identify and capture) text data from web pages and save them into spreadsheet format.

Examples of uses for this software might include:

Creating a database of relevant links and metadata from a broad Google search,
Capturing all underlying data files from within a suite of web pages,
Capturing all images from a list of relevant web sites,
Capturing all text contained HTML lists within a set of related web sites, and
Capturing and converting embedded raw data into Excel spreadsheets.

Outwit demo video showing the capture of links, files, images, data: http://screencast.com/t/TwTfvkeehZ

There is great power of manipulation once you are able to harvest and re-purpose this type of information rather than simply look at it or manually re-create it using copy-and-paste methods across multiple web sites.

LOCATING DATA and METHODS TOOLS

DiRT - Digital Research Tools Wiki - portal to help scholars locate subject-specific software (particularly in the humanities and social sciences)
Vizier, an open source software tool, allows analysts to quickly see, edit, and revise big data sets.
Tools such as NVIVO and free alternatives to NVIVO will assist with coding and analyzing text and other types of materials.
protocols.io is a platform that allows researchers to record and share their reproducible methodologies

Organizing Your Information

Once you have captured information, it is time to organize it for easier recovery, review, and manipulation.

BIG DATA:

Raw data sets must be manipulated in tools such as SPSS and SAS.

There are now government and granting agency requirements for providing shared data using DMPs (Mata Management Plans). See the Purdue University Data site and their collaborative Data Curation Profiles Toolkit for information on creating a data plan. The University of California also provides guidance through their DMPTool.

Investigators can choose to develop or contribute to local data repositories, or they can look for collaborative data repositories. A few places to start are the Open Access Directory, Re3data.org, Open Science Framework (OSF), and Zenodo. These sites list hundreds of repositories and data platforms in many fields, from art history to zoology.

PERSONAL DATA AND KNOWLEDGE MANAGEMENT:

There are a number of tools for handling published and unpublished academic information found on the web. These tools allow you to capture and annotate citations and fulltext materials. The resulting personal knowledge database is then searchable. Files can even be shared publically and privately. These tool often allow you to integrate the citations into word processor documents, creating various styles of references and bibliographies.

For more information on these Personal Knowledge Management (PKM) tools see the Tab labeled "Citation and Document Management".

Sharing Your Information

Sharing information can be a great way to communicate ideas and important resources. There are logistical and legal concerns that must be considered when you are handling Intellectual property.

Trademarks determine if you may use logos that are registered at the national or state level.

Plagiarism is the term used when you claim credit for the ideas or works of another person.

Pirating is the term used for illegally copying and distributing copyrighted material.

Self-publishing options:

Start Here: How to Self-Publish Your Book by Jane Friedman
an article on things to know/consider provides the following advice:
- Once you have your book finalized in a Word or PDF file, it's relatively easy to convert it into one of the many e-book formats -- or just offer it as a PDF download.
- If you have market aspirations for your book, buy your own ISBN (International Standard Book Number) and create your own publishing company. For around $100 (what a single ISBN costs) and a little added paperwork, you can go toe-to-toe with any small publisher. the information to buy an ISBN.
- consider paying an extra $300 fee to be able to talk directly to a live person on the phone for customer support. Companies like Lulu and CreateSpace have complete DIY options and require no upfront setup fees. That's great, but when you're dealing with a superbasic package, you're most likely going to be doing customer support via e-mail or IM, and get very little hand-holding.
- reputable book reviewers such as Kirkus and more recently Publishers Weekly are offering special reviews services geared toward self-published authors. In the case of Kirkus Indie, the author pays a fee to have the book reviewed (around $400-$550, depending on the speed) and a freelancer writes an objective critique (yes, they do negative reviews) in the same format as a standard Kirkus review. (You can also submit books that are in an e-book-only format). Publishers Weekly offers something called PW Select. While you can submit your book for review for a fee of $149, only about 25 percent of the book submissions end up being reviewed.
- you need to have your Amazon page look as good as possible and take advantage of the tools Amazon has to help you surface your book ("Tags," Listmania, reader reviews, etc.). Check out Amazon's Author Central to get some helpful tips.
- To get a rough idea of how much money you can make selling your book, you can check out CreateSpace's royalty calculator. Overall, compared with what traditional publishers pay out, royalty rates for self-published books are actually quite decent. But the fact is, to compete against top-selling titles from traditional publishers, your book should be priced $8.99 or $9.99, and that's simply not possible if it's longer than 250 pages.
Amazon owns CreateSpace a service where books are printed only when someone orders a copy; neither author nor publisher is forced into buying a bunch of books and having to hawk them. Includes free software for the many aspects of book publishing (editing, cover design, etc.).
BookBaby offers help with production and distribution.
Ingram (a major book distributor) owns Lightning Source which is a print-on-demand service, you must upload your previously created material.
IngramSpark: for print distribution to non-Amazon universe ($49)
Amazon KDP: for ebook distribution to Amazon (zero upfront cost)
Draft2Digital: for ebook distribution to everyone else (zero upfront cost)
Smashwords and Lulu offer indie author platforms.
inkie.org provides tools for authors to publish e-books.
Pressbooks suite of publishing tools, authors can design, export, and self-publish their work in a variety of e-book formats from the comfort of their own homes. You can watch a tutorial on how to use Pressbooks here.

A few reviews of self-publishing companies:

First Time Publishers: Resources and useful links for self publishers: Top 10 self-publishing companies.
Top 10 Self Publishing Companies to Publish Your First Book

Data Visualization

One area of information research is Data Visualization: making raw data more understandable through presenting visceral images with meaning to your particular audience. Visualizations may include concept-space subject diagrams, pattern recognition to spot trends, and projections of trends over time with implications for stakeholders.

Persuasive presentations require preparation time to identify, critically analyze, organize, and select appropriate materials and display methods.

An example of a powerful data visualization tool is AcademyScope - watch a video detailing the design and programming of the AcademyScope visual representation of subject navigation among books. Concept spaces are used to show semantic relationships and frequencies of terms.

See this impressive listing of information graphics tools.

Other examples of providing data in a visual way are:

the free Tableau-Public tool that generates powerful visualizations from your data
GapMinder - demonstration of a dynamic ife expectancy chart and graph
Worldmapper - cartographic representations of various data sets
Bloomberg U.S. land use maps - displays of percentages of land used for various services/activities
Emblematica - UIUC search platform of embedded images with metadata facets