Introduction to Machine learning in social science

Date & Time

Oct. 7, 2025, 9 a.m. - Oct. 7, 2025, 10:30 a.m.

Cost

$0

Location

Online


Sign Up


Description

Session 1: Tuesday 7th October, 15:00-16:30 BST

Session 2: Wednesday 15th October, 15:00 - 16:30 BST

Held across two 90-minute sessions (7 & 15 October, 15:00-16:30), this workshop will introduce participants to machine learning in social science and best practices for developing training data. The first session will explore the available methods and approaches while the second will focus on implementation using examples.

Please note participants are required to attend both sessions.

 

About the workshop

Continued investment in new and existing data collection infrastructures (such as surveys and smart data) highlights the growing need for the creation of efficient, robust and scalable data resources which help researchers find and access data.

Recent advances in artificial intelligence (AI) methods to facilitate automatic analysis of large text corpora provides a unique opportunity, at the intersection of computational techniques and research methodologies, for the development of data resources that can meet the research community’s current and future needs.With the widening application of AI and machine learning (ML) pipelines for processing large text corpora, this workshop will focus on a fundamental prerequisite before setting up any pipeline for downstream tasks: the Dataset. It is a common perception that ML models are data hungry and require a vast amount of data to enhance model performance. While understandable, this perception can sometimes overshadow the importance of data quality.

Led by Postdoctoral Research Fellows, Wing Yan Li and Chandresh Pravin (University of Surrey) this workshop will cover a typical “packaging” of data to train and evaluate models.

We will explore various aspects that contribute towards good practice for creating quality training datasets, including

· exploratory data analysis,

· the selection of evaluation metrics,

· model selection, and

· model evaluation.

Conventionally, models are evaluated quantitatively, as represented by the appropriate metrics, and qualitatively. While it might be tedious to qualitatively analyse all the samples, random sampling could be problematic. In the section covering model evaluation, workshop participants will be introduced to the problem of data biases and gaps. By bridging technological approaches with social science research needs, this workshop offers an exploration of data transformation techniques that enhance research reproducibility and computational analysis capabilities.

 

Who can attend?

This workshop is open to all, and no prior knowledge of the topic is required.

 

Registering for the workshop

This workshop will be split across two sessions (7 & 15 October, 15:00-16:30). Participants are required to attend both sessions.

By registering for the workshop via Eventbrite, you are confirming your attendance at both sessions.

Numbers for the workshop will be limited so please only compete the registration if you’re available for both dates.

Joining instructions for each session will be sent in advance to all registered attendees.

 

About the presenters

Chandresh Pravin and Wing Yan Li are currently working as Postdoctoral Research Fellows on the ESRC-funded project, Machine Learning to Improve Metadata Quality (METACURATE-ML), a collaboration between the University of Surrey, University of Essex, NatCen and CLOSER.

 

Further information

If you have any questions or require further information, please contact CLOSER Administrative and Events Assistant, Becky England.

 

CLOSER Data Managers Network

This webinar is organised as part of the CLOSER Data Managers Network which is comprised of professionals working on study data management. The network is open to studies that form part of CLOSER and/or CLOSER Discovery, as well as professionals from the wider data management community.

For more information, or to enquire about joining the CLOSER Data Managers Network, please contact us.