Lorin Crawford - When more isn't better: Rethinking Scale in Single-cell Foundation Models

Recorded 27 February 2026. Lorin Crawford of Microsoft Research New England presents "When more isn't better: A Lesson from Rethinking Scale in Single-cell Foundation Models" at IPAM's Mathematics of Cancer: Open Mathematical Problems Workshop.
Abstract: The success of transformer-based foundation models on natural language and images has motivated their use in single-cell transcriptomics. In this talk, we will assess how pre-training dataset size and diversity affect the performance of single-cell foundation models. Using a corpus of 22.2 million cells, we pre-trained 400 models and evaluated over 6,400 experiments. Our results show that current methods tend to plateau in performance with pre-training datasets that are only a fraction of the size, challenging the assumption that ever-larger datasets are required for optimal generalization. This will lead us to the second half of the talk where we evaluate training data composition on model performance. Focusing on human hematopoiesis, we train and analyze deep generative models with a variety of training datasets, including cells from adult and developing tissues, disease states, and perturbation atlases. Here, we observe that (1) deep generative models generalize poorly to unseen cell types and (2) addition of malignant or perturbed cells to healthy corpora does not consistently improve modeling of novel states. These findings highlight the nuanced roles of dataset size and heterogeneity, suggesting that strategic curation, rather than indiscriminate scaling, is key for optimizing single-cell foundation models.
Learn more online at: https://www.ipam.ucla.edu/programs/workshops/mathematics-of-cancer-open-mathematical-problems/ Receive SMS online on sms24.me

TubeReader video aggregator is a website that collects and organizes online videos from the YouTube source. Video aggregation is done for different purposes, and TubeReader take different approaches to achieve their purpose.

Our try to collect videos of high quality or interest for visitors to view; the collection may be made by editors or may be based on community votes.

Another method is to base the collection on those videos most viewed, either at the aggregator site or at various popular video hosting sites.

TubeReader site exists to allow users to collect their own sets of videos, for personal use as well as for browsing and viewing by others; TubeReader can develop online communities around video sharing.

Our site allow users to create a personalized video playlist, for personal use as well as for browsing and viewing by others.

@YouTubeReaderBot allows you to subscribe to Youtube channels.

By using @YouTubeReaderBot Bot you agree with YouTube Terms of Service.

Use the @YouTubeReaderBot telegram bot to be the first to be notified when new videos are released on your favorite channels.

Look for new videos or channels and share them with your friends.

You can start using our bot from this video, subscribe now to Lorin Crawford - When more isn't better: Rethinking Scale in Single-cell Foundation Models

What is YouTube?

YouTube is a free video sharing website that makes it easy to watch online videos. You can even create and upload your own videos to share with others. Originally created in 2005, YouTube is now one of the most popular sites on the Web, with visitors watching around 6 billion hours of video every month.