Job Details  

Research Associate I
Students cannot apply for this job online.
Job ID 74908
Job Funding Source Work-Study, Non-Work-Study, Remote
Employer Information, School of
Category Professional/Administrative
Job Description

How to Apply:

A brief cover letter is required for consideration for this position and should be attached as the first page of your resume or CV. The cover letter should address your specific interest in the position and outline skills and experience that directly relate to this position. Please send your cover letter and resume or CV to Professor Sarita Schoenebeck (yardi@umich.edu).

Job Summary:

We are looking for 1–3 master's students to support a research project that develops computational methods to identify when academic papers introduce new research concepts and terminology. The project focuses on building and evaluating machine learning models that classify research papers based on labeled training data from publications.

Students in this role will develop a machine learning pipeline. This pipeline is for analyzing academic papers, which includes extracting text from PDFs. Additionally, it involves working with text representations, such as embeddings. The pipeline also includes training classifiers and evaluating model performance against human-coded data. The task involves working with imperfect labels and ambiguous classifications, requiring careful modeling and iteration. Pay starts at $28.00 per hour. Hours per week and start and end dates subject to discussion

Educational Value
  • Work in a research team including faculty and graduate students
  • Build an end-to-end NLP/ML pipeline on a large, real-world corpus of academic papers
  • Gain experience working with noisy, human-labeled data, including evaluation and error analysis
  • Learn applied machine learning in a research context, including validation and reproducibility 
  • Opportunity for co-authorship on research publications, depending on contribution

 

Job Requirements

Responsibilities:

  • Develop a machine learning pipeline that processes a large corpus of academic papers
  • Extract and preprocess text from PDF research papers
  • Implement NLP workflows (e.g., embeddings, classification models) using labeled training data
  • Train and evaluate models that classify whether papers introduce new research terminology
  • Conduct error analysis (e.g., false positives/negatives) to improve model performance
  • Analyze model performance using validation and holdout datasets
  • Evaluate and validate model outputs against human-coded datasets
  • Document modeling workflows, experiments, and results to support reproducibility

Required Qualifications:

  • Current Master's student
  • 1+ years experience programming in Python
  • 1+ years experience with machine learning and natural language processing
  • 1+ years experience with common data science tools (e.g., pandas, scikit-learn, Jupyter/Colab)
  • 1+ years experience working with dataset

 

Hourly Rate $28.00/hour to $31.00/hour
Hours 10.0 to 20.0 hours per week
Time Frame Fall/Winter/Spring/Summer
Start Date ASAP
End Date At completion of project
Primary Contact Sarita Schoenebeck
Primary Contact's Email N/A
Supervisor Sarita Schoenebeck
Work Location Leinweber, 2200 Hayward St
Phone N/A
Fax N/A