🎉 C4AI Command-R, a 35 billion parameter model weights release, is now available to experiment, explore, and build on for fundamental research and safety auditing. It has been optimized for multilingual generation in 10 languages, tool-use, and has highly performant RAG capabilities. We’re releasing C4AI Command-R as part of our commitment to bridging the growing gap in access to frontier technology for fundamental research and safety auditing.
🔥Check out our latest Fireside Chat event hosted by Marzieh Fadaee, Sr. Research Scientist at C4AI, featuring Mirella Lapata, Professor at School of Informatics, Edinburgh University as they discuss "Exploring Planning in Models and Life"
🌿The Aya Collection features 513 million data points across 101 languages - this can make it hard to filter if you only want a single language. To address this, we’ve released a version split by language.
📺 To celebrate the launch of Aya last month, we hosted a 5-hour live stream to spotlight our worldwide collaborators and the unique perspectives they brought to Aya. We’ve recently released this playlist featuring many of the lightning talks & panels from this event.
State-of-art models are becoming increasingly multilingual. But why aren’t safety guardrails? Work to date on toxicity mitigation has exclusively focused on the English language. Yet harm exists in all languages. Our research addresses toxicity mitigation across multiple languages in “From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models.” We delve into the complexities of toxicity assessment across 9 languages spanning 5 scripts, ranging from high to mid-resource availability. We view our work as a foundational step in multilingual harm mitigation and evaluation within large language models. Let's continue exploring this crucial domain together. Congrats to authors Luiza Pozzobon, Patrick Lewis, Sara Hooker, Beyza Ermis
Our latest work Investigating Continual Pretraining in Large Language Models: Insights and Implications studies continual pretraining in large language models (LLMs). Unlike previous studies, which mostly concentrate on a limited selection of tasks or domains and primarily aim to address the issue of forgetting, our research evaluates the adaptability and capabilities of LLMs to changing data landscapes in practical scenarios. To this end, we introduce a new benchmark based on the M2D2 dataset to measure the adaptability of LLMs to these evolving data environments, offering a comprehensive framework for evaluation. Congrats to authors Cagatay Yildiz, Nishaanth Kanna Ravichandran, Prishruit Punia, Matthias Bethge and Beyza Ermis.
We are proud to see so many of our independent researchers reach incredible research milestones in September. Congratulations on such a huge accomplishment! C4AI open science community members are indicated in bold. 🥳
Precision-Driven Low-Resource Speech Synthesis for Bangla Text-to-Speech System by Tabassum Sadia Shahjahan, Md. Ismail Hossain, Kazi Rafat, Md Ruhul Amin, Fuad Rahman, Nabeel Mohammed
Open Science Community-led Events
A huge thank you to our Community Leads for organizing these upcoming events: Ahmad Anis, Benedict Emoekabu, Ameed Taylor, Anier Velasco, and Martina Gonzalez Vilas.