Publications |

2026

Scalable Object Detection in the Car Interior With Vision Foundation Models

Sebastian Schmidt, Bálint Mészáros, Ahmet Firintepe, and 1 more author

In Proceedings of the IEEE Intelligent Vehicles Symposium (IV), 2026

Abs Bib PDF Website

AI tasks in the car interior like identifying and localizing externally introduced objects is crucial for response quality of personal assistants. However, computational resources of on-board systems remain highly constrained, restricting the deployment of such solutions directly within the vehicle. To address this limitation, we propose the novel Object Detection and Localization (ODAL) framework for interior scene understanding. Our approach leverages vision foundation models through a distributed architecture, splitting computational tasks between on-board and cloud. This design overcomes the resource constraints of running foundation models directly in the car. To benchmark model performance, we introduce ODALbench, a new metric for comprehensive assessment of detection and localization.Our analysis demonstrates the framework’s potential to establish new standards in this domain. We compare the state-of-the-art GPT-4o vision foundation model with the lightweight LLaVA 1.5 7B model and explore how fine-tuning enhances the lightweight models performance. Remarkably, our fine-tuned ODAL-LLaVA model achieves an ODAL_{score} of 89%, representing a 71% improvement over its baseline performance and outperforming GPT-4o by nearly 20%. Furthermore, the fine-tuned model maintains high detection accuracy while significantly reducing hallucinations, achieving an ODAL_{SNR} three times higher than GPT-4o.
@inproceedings{Mszros2025, title = {Scalable Object Detection in the Car Interior With Vision Foundation Models}, author = {Schmidt, Sebastian and M\'{e}sz\'{a}ros, B\'{a}lint and Firintepe, Ahmet and G\"{u}nnemann, Stephan}, year = {2026}, booktitle = {Proceedings of the IEEE Intelligent Vehicles Symposium (IV)}, url = {TBD}, }

Amplified Patch-Level Differential Privacy for Free via Random Cropping

Kaan Durmaz, Jan Schuchardt, Sebastian Schmidt, and 1 more author

Transaction on Machine Learning Research (TMLR), 2026

@article{Durmaz2026amplified,
  title = {Amplified Patch-Level Differential Privacy for Free via Random Cropping},
  author = {Durmaz, Kaan and Schuchardt, Jan and Schmidt, Sebastian and G\"{u}nnemann, Stephan},
  year = {2026},
  journal = {Transaction on Machine Learning Research (TMLR)},
  url = {https://openreview.net/forum?id=pSWuUF8AVP},
}

2025

Prior2Former - Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation

Sebastian Schmidt, Julius Körner, Dominik Fuchsgruber, and 3 more authors

In , 2025

Abs Bib PDF Website

@inproceedings{Schmidt2025b,
  title = {Prior2Former - Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation},
  author = {Schmidt, Sebastian and K\"orner, Julius and Fuchsgruber, Dominik and Gasperini, Stefano and Tombari, Federico and G\"unnemann, Stephan},
  year = {2025},
  journal = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) - Highlight},
  url = {http://arxiv.org/abs/2405.11337},
  volume = {2405.11337},
  eprint = {arXiv:2405.11337},
}

Joint Out-of-Distribution Filtering and Data Discovery Active Learning

Sebastian Schmidt, Leonard Schenk, Leo Schwinn, and 1 more author

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

Abs Bib PDF Website

As the data demand for deep learning models increases, active learning (AL) becomes essential to strategically select samples for labeling, which maximizes data efficiency and reduces training costs. Real-world scenarios necessitate the consideration of incomplete data knowledge within AL. Prior works address handling out-of-distribution (OOD) data, while another research direction has focused on category discovery. However, a combined analysis of real-world considerations combining AL with out-of-distribution data and category discovery remains unexplored. To address this gap, we propose Joint Out-of-distribution filtering and data Discovery Active learning (Joda) , to uniquely address both challenges simultaneously by filtering out OOD data before selecting candidates for labeling. In contrast to previous methods, we deeply entangle the training procedure with filter and selection to construct a common feature space that aligns known and novel categories while separating OOD samples. Unlike previous works, Joda is highly efficient and completely omits auxiliary models and training access to the unlabeled pool for filtering or selection. In extensive experiments on 18 configurations and 3 metrics, \ours{} consistently achieves the highest accuracy with the best class discovery to OOD filtering balance compared to state-of-the-art competitor approaches.
@inproceedings{Schmidt2025a, author = {Schmidt, Sebastian and Schenk, Leonard and Schwinn, Leo and Günnemann, Stephan}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, title = {Joint Out-of-Distribution Filtering and Data Discovery Active Learning}, url = {http://arxiv.org/abs/2503.02491}, year = {2025}, }
A Machine Learning Perspective on Automated Driving Corner Cases

Sebastian Schmidt, Julius Körner, and Stephan Günnemann

ArXiv, Oct 2025

Abs Bib PDF Website

For high-stakes applications, like autonomous driving, a safe operation is necessary to prevent harm, accidents, and failures. Traditionally, difficult scenarios have been categorized into corner cases and addressed individually. However, this example-based categorization is not scalable and lacks a data coverage perspective, neglecting the generalization to training data of machine learning models. In our work, we propose a novel machine learning approach that takes the underlying data distribution into account. Based on our novel perspective, we present a framework for effective corner case recognition for perception on individual samples. In our evaluation, we show that our approach (i) unifies existing scenario-based corner case taxonomies under a distributional perspective, (ii) achieves strong performance on corner case detection tasks across standard benchmarks for which we extend established out-of-distribution detection benchmarks, and (iii) enables analysis of combined corner cases via a newly introduced fog-augmented Lost & Found dataset. These results provide a principled basis for corner case recognition, underlining our manual specification-free definition.
@article{Schmidt2025d, title = {A Machine Learning Perspective on Automated Driving Corner Cases}, author = {Schmidt, Sebastian and K\"{o}rner, Julius and G\"{u}nnemann, Stephan}, year = {2025}, month = oct, journal = {ArXiv}, volume = {2510.10653}, url = {http://arxiv.org/abs/2510.10653}, }

GeoDiffusion: A Training-Free Framework for Accurate 3D Geometric Conditioning in Image Generation

Phillip Mueller, Talip Uenlue, Sebastian Schmidt, and 4 more authors

In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Oct 2025

Abs Bib PDF Website

@inproceedings{mueller2025,
  title = {GeoDiffusion: A Training-Free Framework for Accurate 3D Geometric Conditioning in Image Generation},
  author = {Mueller, Phillip and Uenlue, Talip and Schmidt, Sebastian and Kollovieh, Marcel and Fan, Jiajie and G\"unnemann, Stephan and Mikelsons, Lars},
  year = {2025},
  month = oct,
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  pages = {6374--6384},
  url = {TBD},
}

Effective Data Pruning through Score Extrapolation

Sebastian Schmidt, Prasanga Dhungel, Christoffer Löffler, and 3 more authors

ArXiv, Jun 2025

Abs Bib PDF Website

Training advanced machine learning models demands massive datasets, resulting in prohibitive computational costs. To address this challenge, data pruning techniques identify and remove redundant training samples while preserving model performance. Yet, existing pruning techniques predominantly require a full initial training pass to identify removable samples, negating any efficiency benefits for single training runs. To overcome this limitation, we introduce a novel importance score extrapolation framework that requires training on only a small subset of data. We present two initial approaches in this framework - k-nearest neighbors and graph neural networks - to accurately predict sample importance for the entire dataset using patterns learned from this minimal subset. We demonstrate the effectiveness of our approach for 2 state-of-the-art pruning methods (Dynamic Uncertainty and TDDS), 4 different datasets (CIFAR-10, CIFAR-100, Places-365, and ImageNet), and 3 training paradigms (supervised, unsupervised, and adversarial). Our results indicate that score extrapolation is a promising direction to scale expensive score calculation methods, such as pruning, data attribution, or other tasks.
@article{Schmidt2025c, title = {Effective Data Pruning through Score Extrapolation}, author = {Schmidt, Sebastian and Dhungel, Prasanga and L\"{o}ffler, Christoffer and Nieth, Bj\"{o}rn and G\"{u}nnemann, Stephan and Schwinn, Leo}, year = {2025}, month = jun, journal = {ArXiv}, volume = {2506.09010}, url = {http://arxiv.org/abs/2506.09010}, }
Unexplored flaws in multiple-choice VQA evaluations

Fabio Rosenthal, Sebastian Schmidt, Thorsten Graf, and 3 more authors

ArXiv, Nov 2025

Abs Bib PDF Website

Multimodal Large Language Models (MLLMs) demonstrate strong capabilities in handling image-text inputs. A common way to assess this ability is through multiple-choice Visual Question Answering (VQA). Earlier works have already revealed that these benchmarks are sensitive to answer choice order, a limitation that can be mitigated through careful design. Yet, we highlight additional, unexplored biases in prompt formatting that question the reliability of current MLLM evaluations. Specifically, we identify three key variation factors in prompt formatting and analyze their impact through a large-scale study involving seven MLLMs and five VQA datasets, spanning 48 distinct prompt format variations. Our findings reveal that multiple-choice VQA is highly sensitive to minor prompt format changes, even when these changes are semantically neutral. We further demonstrate that these biases persist independently of known order biases or the MLLM’s confidence in the correct answer. Finally, we demonstrate that existing bias mitigation strategies fail to address these newly identified biases.
@article{Rosenthal2025, title = {Unexplored flaws in multiple-choice VQA evaluations}, author = {Rosenthal, Fabio and Schmidt, Sebastian and Graf, Thorsten and Bagodonat, Thorsten and G\"{u}nnemann, Stephan and Schwinn, Leo}, year = {2025}, month = nov, journal = {ArXiv}, volume = {2511.22341}, url = {http://arxiv.org/abs/2511.22341}, }

2024

A Unified Approach Towards Active Learning and Out-of-Distribution Detection

Sebastian Schmidt, Leonard Schenk, Leonard Schwinn, and 1 more author

Transaction on Machine Learning Research (TMLR), Nov 2024

Abs Bib PDF Website

@article{Schmidt2024,
  title = {A Unified Approach Towards Active Learning and Out-of-Distribution Detection},
  author = {Schmidt, Sebastian and Schenk, Leonard and Schwinn, Leonard and G\"unnemann, Stephan},
  year = {2024},
  journal = {Transaction on Machine Learning Research (TMLR)},
  volume = {2405.11337},
  url = {http://arxiv.org/abs/2405.11337},
  eprint = {arXiv:2405.11337}
}

Generalized Synchronized Active Learning for Multi-Agent-Based Data Selection on Mobile Robotic Systems

Sebastian Schmidt, Lukas Stappen, Leo Schwinn, and 1 more author

IEEE Robotics and Automation Letters, Nov 2024

Abs DOI Bib PDF Website

@article{Schmidt2024b,
  author = {Schmidt, Sebastian and Stappen, Lukas and Schwinn, Leo and Günnemann, Stephan},
  journal = {IEEE Robotics and Automation Letters},
  title = {Generalized Synchronized Active Learning for Multi-Agent-Based Data Selection on Mobile Robotic Systems},
  year = {2024},
  volume = {9},
  number = {10},
  pages = {8659-8666},
  keywords = {Robots;Robot kinematics;Uncertainty;Data centers;Synchronization;Task analysis;Streams;Computer vision for transportation;deep learning for visual perception;deep learning methods},
  url = {https://ieeexplore.ieee.org/abstract/document/10637683},
  doi = {10.1109/LRA.2024.3444670},
}

Deep Sensor Fusion with Constraint Safety Bounds for High Precision Localization

Sebastian Schmidt, Ludwig Stumpp, Diego Valverde, and 1 more author

In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Nov 2024

Abs Bib PDF Website

@inproceedings{schmidt2024c,
  title = {Deep Sensor Fusion with Constraint Safety Bounds for High Precision Localization},
  author = {Schmidt, Sebastian and Stumpp, Ludwig and Valverde, Diego and G{\"u}nnemann, Stephan},
  booktitle = {2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  pages = {12256--12262},
  year = {2024},
  url = {https://ieeexplore.ieee.org/document/10802242},
  organization = {IEEE}
}

2023

Stream-based Active Learning by Exploiting Temporal Properties in Perception with Temporal Predicted Loss

Sebastian Schmidt and Stephan Günnemann

In Proceedings of the British Machine Vision Conference (BMVC), Nov 2023

Abs Bib PDF Website

@inproceedings{Schmidt2023,
  title = {Stream-based Active Learning by Exploiting Temporal Properties in Perception with Temporal Predicted Loss},
  author = {Schmidt, Sebastian and G\"unnemann, Stephan},
  year = {2023},
  booktitle = {Proceedings of the British Machine Vision Conference (BMVC)},
  url = {http://arxiv.org/abs/2309.05517},
  eprint = {arXiv:2309.05517},
}

2020

Advanced Active Learning Strategies for Object Detection

Sebastian Schmidt, Qing Rao, Julian Tatsch, and 1 more author

In Proceedings of the IEEE Intelligent Vehicles Symposium (IV), Nov 2020

Abs Bib PDF Website

Future self-driving cars must be able to perceive and understand their surroundings. Deep learning based approaches promise to solve the perception problem but require a large amount of manually labeled training data. Active learning is a training procedure in which the model itself selects interesting samples for labeling based on their uncertainty, with substantially less data required for training. Recent research in active learning has mostly focused on the simple image classification task. In this paper, we propose novel methods to estimate sample uncertainties for 2D and 3D object detection using Ensembles. We moreover evaluate different training strategies including Continuous Training to alleviate increasing training times introduced by the active learning cycle. Finally, we investigate the effects of active learning on imbalanced datasets and possible interactions with class weighting. Experiment results show both increased time saving around 55% and data saving rates of around 30%. For the 3D object detection task, we show that our proposed uncertainty estimation method is valid, saving 35% of labeling efforts and thus is ready for application for automotive object detection use cases.
@inproceedings{Schmidt2020, title = {Advanced Active Learning Strategies for Object Detection}, author = {Schmidt, Sebastian and Rao, Qing and Tatsch, Julian and Knoll, Alois}, year = {2020}, booktitle = {Proceedings of the IEEE Intelligent Vehicles Symposium (IV)}, issue = {Iv}, _pages = {871-876}, url = {https://mediatum.ub.tum.de/doc/1585225}, }