Data Science Consultancy Club mentoring: SICE student projects for IU Paleo & IGWS collections data

Status Start Date End Date Locations
completed Oct 16, 2017 May 16, 2018 All Counties
Director: Gary Motz
Other Researchers: Kyle Stirling, Charlene Tay, Gotit Singh, Ramji Chandrasekaran, Udit Patel, Vaibhav Vijaykumar, Kushal Kokje, Subramanian Shanmugavel
Funding: Microsoft
Issue: Collections data is locked up in handwritten labels that is often costly and error-prone to transcribe using traditional methods.
Objective: Evaluate artificial intelligence, computer vision, and machine learning methods to generate a paleontological lexicon for geological terms and employ maximum likelihood models for natural language parsing in the auto-recognition of handwritten text for catalog transcription.
Approach: The Data Science Consultancy Club will maintain a project wiki and GitHub repository for all codebase products generated by this project. The Project Director will coordinate with the DSCC on a biweekly basis to ensure overall project milestones are on track. Course credit will be applied for data science graduate students and cloud computing resources will be supplied by the Microsoft Azure for Research grant to Motz.
Products: This explorative project will evaluate Microsoft Azure cloud computing resources potential for unsupervised machine learning and database development from handwritten specimen records. Given that this is sponsored research for a student project, no products are expected by the funding agency. Publications regarding this work may be pursued in collaboration with the Natural History Museum London.
Benefits: Expansion of data science applications to real-world data and potentially tremendous cost-savings in terms of human effort to be contributed toward the digitization and discoverability of the paper, handwritten records of the IU Paleo Collection.