Hugging Face announced the expansion of its LeRobot platform on Wednesday with a large dataset aimed at automotive automation. The online artificial intelligence (AI) and machine learning (ML) repository said that the dataset was created in collaboration with the AI startup Yaak. Dubbed Learning to Drive (L2D), the dataset was collected from a suite of sensors installed on 60 electric vehicles (EVs) over a period of three years. The open-source dataset is aimed at enabling developers and the robotics community to build spatial intelligence solutions for the automobile industry.
Hugging Face Adds L2D Dataset to LeRobot
In a blog post, the company detailed the new AI dataset, calling it “the world’s largest multimodal dataset aimed at building an open-sourced spatial intelligence for the automotive domain.” The entire dataset is more than 1PB (one PetaByte) in size, and was collected using sensor suites installed on 60 EVs operated by driving schools in 30 German cities for three years. Identical sensors were used to ensure consistency in the data collected.
The LeRobot platform was launched last year as a collection of open-source AI models, datasets, and accompanying tools that can help developers build AI-powered robotics systems.
The Learning to Drive dataset
Photo Credit: Hugging Face
The policies in the dataset are divided into two groups of expert policies and student policies. The former is comprised of data from driving instructors while the latter comes from learner drivers. Hugging Face stated that the expert policy has zero driving mistakes and is considered optimal, whereas the student policy contains known sub-optimalities. Both groups include natural language instructions for driving tasks.
Each group features all driving scenarios that are necessary for completion to obtain a driving licence in the European Union (EU). Some of these driving tasks include overtaking, roundabout handling, and track driving.
Detailing the sensor suite used to capture the L2D data, Hugging Face said that each of the 60 Kia Niro EV models were equipped with six RGB cameras to capture the vehicle’s surrounding in 360p, on-board GPS for vehicle location and mapping, an inertial measurement unit (IMU) to capture vehicle dynamics. All the data was captured with timestamps.
Notably, the dataset is aimed at helping developers and robotics scientists build end-to-end self-driving AI models that can eventually be used to build fully autonomous vehicle systems.
Hugging Face highlighted that the L2D dataset will be released in a phased manner, where each successive release will be a superset of the previous releases to ensure ease of access. The platform is also inviting the community to submit models for closed loop testing of the dataset with a safety driver. This will begin in summer 2025.