LOCATING DATA SETS
The library provides access to the SimplyAnalytics (was SimplyMap) tool that allows researchers to mine U.S. census data and sets of business data in order to generate maps or build data sets which can be exported into other tools such as Excel, SPSS, and SAS for further manipulation. (Simply Map Information and help video page.)
Researchers can search for data sets using Databib - a searchable catalog / registry / directory / bibliography of research data repositories. These data sets must be imported (in some cases after format manipulation) into data tools such as SPSS and SAS. For additional information about Data support see the Purdue University Data site.
Another new initiative is Google's Dataset Search which attempts to harvest and federate scientific datasets.
DOE Scientific Research Data - quickly see what data are available, where data collections reside, and go directly to the data; users can peruse recently added or revised content, view the hundreds of datasets, data streams, and data collections by title, display data from more than 50 subject categories, and select content by sponsoring or originating research organizations.
Open Science Framework (OSF) is another platform for storing and sharing research data.
Zenodo is a platform created by CERN as an open dependable home for the long-tail of science, enabling researchers to share and preserve any research outputs in any size, any format and from any science.
Zanran Numerical Data Search is a search engine for data and statistics. Zanran’s strength is in finding graphs, charts and tables on the Internet, which distinguishes it from other search engines.
Another way to uncover data sets is to enter a web search with the term "site:statista.com" .... the example format would be: alcoholic beverages site:statista.com
CAPTURING FULL TEXT MATERIAL FOR MANIPULATION
One tool used to capture published data is the TDM service created by CrossRef. This Text and Data Mining (TDM) tool allows a researcher to download metadata from various publishers sites. This information would then allow you to download the PDF fulltext materials. The final step is to convert the PDF into XML markup for manipulation.
CAPTURING DATA FROM WEB PAGES
Outwit is a software product that allows you to automatically harvest (capture and download) embedded links, associated documents and data files, and to scrape (identify and capture) text data from web pages and save them into spreadsheet format.
Examples of uses for this software might include:
There is great power of manipulation once you are able to harvest and re-purpose this type of information rather than simply look at it or manually re-create it using copy-and-paste methods across multiple web sites.
LOCATING DATA and METHODS TOOLS
Once you have captured information, it is time to organize it for easier recovery, review, and manipulation.
BIG DATA:
Raw data sets must be manipulated in tools such as SPSS and SAS.
There are now government and granting agency requirements for providing shared data using DMPs (Mata Management Plans). See the Purdue University Data site and their collaborative Data Curation Profiles Toolkit for information on creating a data plan. The University of California also provides guidance through their DMPTool.
Investigators can choose to develop or contribute to local data repositories, or they can look for collaborative data repositories. A few places to start are the Open Access Directory, Re3data.org, Open Science Framework (OSF), and Zenodo. These sites list hundreds of repositories and data platforms in many fields, from art history to zoology.
PERSONAL DATA AND KNOWLEDGE MANAGEMENT:
There are a number of tools for handling published and unpublished academic information found on the web. These tools allow you to capture and annotate citations and fulltext materials. The resulting personal knowledge database is then searchable. Files can even be shared publically and privately. These tool often allow you to integrate the citations into word processor documents, creating various styles of references and bibliographies.
For more information on these Personal Knowledge Management (PKM) tools see the Tab labeled "Citation and Document Management".
Sharing information can be a great way to communicate ideas and important resources. There are logistical and legal concerns that must be considered when you are handling Intellectual property.
Copyright determines if you have the right to reproduce material.
Trademarks determine if you may use logos that are registered at the national or state level.
Plagiarism is the term used when you claim credit for the ideas or works of another person.
Pirating is the term used for illegally copying and distributing copyrighted material.
Self-publishing options:
Start Here: How to Self-Publish Your Book by Jane Friedman
A few reviews of self-publishing companies:
One area of information research is Data Visualization: making raw data more understandable through presenting visceral images with meaning to your particular audience. Visualizations may include concept-space subject diagrams, pattern recognition to spot trends, and projections of trends over time with implications for stakeholders.
Persuasive presentations require preparation time to identify, critically analyze, organize, and select appropriate materials and display methods.
An example of a powerful data visualization tool is AcademyScope - watch a video detailing the design and programming of the AcademyScope visual representation of subject navigation among books. Concept spaces are used to show semantic relationships and frequencies of terms.
See this impressive listing of information graphics tools.
Other examples of providing data in a visual way are: