Synthetic Data Blueprint (SDB): A modular framework for the statistical, structural, and graph-based evaluation of synthetic tabular data
arXiv:2512.19718v1 Announce Type: new
Abstract: In the rapidly evolving era of Artificial Intelligence (AI), synthetic data are widely used to accelerate innovation while preserving privacy and enabling broader data accessibility. However, the evaluation of synthetic data remains fragmented across ...
PhysMaster: Building an Autonomous AI Physicist for Theoretical and Computational Physics Research
arXiv:2512.19799v1 Announce Type: new
Abstract: Advances in LLMs have produced agents with knowledge and operational capabilities comparable to human scientists, suggesting potential to assist, accelerate, and automate research. However, existing studies mainly evaluate such systems on well-defined...
A Branch-and-Price Algorithm for Fast and Equitable Last-Mile Relief Aid Distribution
arXiv:2512.19882v1 Announce Type: new
Abstract: The distribution of relief supplies to shelters is a critical aspect of post-disaster humanitarian logistics. In major disasters, prepositioned supplies often fall short of meeting all demands. We address the problem of planning vehicle routes from a ...
Interpolative Decoding: Exploring the Spectrum of Personality Traits in LLMs
arXiv:2512.19937v1 Announce Type: new
Abstract: Recent research has explored using very large language models (LLMs) as proxies for humans in tasks such as simulation, surveys, and studies. While LLMs do not possess a human psychology, they often can emulate human behaviors with sufficiently high f...
Zero-Shot Segmentation through Prototype-Guidance for Multi-Label Plant Species Identification
arXiv:2512.19957v1 Announce Type: new
Abstract: This paper presents an approach developed to address the PlantClef 2025 challenge, which consists of a fine-grained multi-label species identification, over high-resolution images. Our solution focused on employing class prototypes obtained from the t...
FGDCC: Fine-Grained Deep Cluster Categorization -- A Framework for Intra-Class Variability Problems in Plant Classification
arXiv:2512.19960v1 Announce Type: new
Abstract: Intra-class variability is given according to the significance in the degree of dissimilarity between images within a class. In that sense, depending on its intensity, intra-class variability can hinder the learning process for DL models, specially wh...
Comparative Evaluation of Explainable Machine Learning Versus Linear Regression for Predicting County-Level Lung Cancer Mortality Rate in the United States
arXiv:2512.17934v1 Announce Type: new
Abstract: Lung cancer (LC) is a leading cause of cancer-related mortality in the United States. Accurate prediction of LC mortality rates is crucial for guiding targeted interventions and addressing health disparities. Although traditional regression-based mode...
What's the Price of Monotonicity? A Multi-Dataset Benchmark of Monotone-Constrained Gradient Boosting for Credit PD
arXiv:2512.17945v1 Announce Type: new
Abstract: Financial institutions face a trade-off between predictive accuracy and interpretability when deploying machine learning models for credit risk. Monotonicity constraints align model behavior with domain knowledge, but their performance cost - the pric...
Convolutional-neural-operator-based transfer learning for solving PDEs
arXiv:2512.17969v1 Announce Type: new
Abstract: Convolutional neural operator is a CNN-based architecture recently proposed to enforce structure-preserving continuous-discrete equivalence and enable the genuine, alias-free learning of solution operators of PDEs. This neural operator was demonstrate...
CodeGEMM: A Codebook-Centric Approach to Efficient GEMM in Quantized LLMs
arXiv:2512.17970v1 Announce Type: new
Abstract: Weight-only quantization is widely used to mitigate the memory-bound nature of LLM inference. Codebook-based methods extend this trend by achieving strong accuracy in the extremely low-bit regime (e.g., 2-bit). However, current kernels rely on dequant...
Parameter-Efficient Fine-Tuning for HAR: Integrating LoRA and QLoRA into Transformer Models
arXiv:2512.17983v1 Announce Type: new
Abstract: Human Activity Recognition is a foundational task in pervasive computing. While recent advances in self-supervised learning and transformer-based architectures have significantly improved HAR performance, adapting large pretrained models to new domain...
This AI finds simple rules where humans see only chaos
A new AI developed at Duke University can uncover simple, readable rules behind extremely complex systems. It studies how systems evolve over time and reduces thousands of variables into compact equations that still capture real behavior. The method works across physics, engineering, climate science...
BIONIX: A Wireless, Low-Cost Prosthetic Arm with Dual-Signal EEG and EMG Control
arXiv:2512.16929v1 Announce Type: new
Abstract: Affordable upper-limb prostheses often lack intuitive control systems, limiting functionality and accessibility for amputees in low-resource settings. This project presents a low-cost, dual-mode neuro-muscular control system integrating electroencepha...
QSMOTE-PGM/kPGM: QSMOTE Based PGM and kPGM for Imbalanced Dataset Classification
arXiv:2512.16960v1 Announce Type: new
Abstract: Quantum-inspired machine learning (QiML) leverages mathematical frameworks from quantum theory to enhance classical algorithms, with particular emphasis on inner product structures in high-dimensional feature spaces. Among the prominent approaches, th...
Compression is Routing: Reconstruction Error as an Intrinsic Signal for Modular Language Models
arXiv:2512.16963v1 Announce Type: new
Abstract: Current Large Language Models (LLMs) face three major challenges: context length limitations, high inference costs, and catastrophic forgetting during continual learning. While Mixture-of-Experts (MoE) architectures mitigate some of these conflicts, t...
Physics-Informed Lightweight Machine Learning for Aviation Visibility Nowcasting Across Multiple Climatic Regimes
arXiv:2512.16967v1 Announce Type: new
Abstract: Short-term prediction (nowcasting) of low-visibility and precipitation events is critical for aviation safety and operational efficiency. Current operational approaches rely on computationally intensive numerical weather prediction guidance and human-...
A new tool is revealing the invisible networks inside cancer
Spanish researchers have created a powerful new open-source tool that helps uncover the hidden genetic networks driving cancer. Called RNACOREX, the software can analyze thousands of molecular interactions at once, revealing how genes communicate inside tumors and how those signals relate to patient...
DiscoverDCP: A Data-Driven Approach for Construction of Disciplined Convex Programs via Symbolic Regression
arXiv:2512.15721v1 Announce Type: new
Abstract: We propose DiscoverDCP, a data-driven framework that integrates symbolic regression with the rule sets of Disciplined Convex Programming (DCP) to perform system identification. By enforcing that all discovered candidate model expressions adhere to DCP...
Hybrid Quantum-Classical Ensemble Learning for S\&P 500 Directional Prediction
arXiv:2512.15738v1 Announce Type: new
Abstract: Financial market prediction is a challenging application of machine learning, where even small improvements in directional accuracy can yield substantial value. Most models struggle to exceed 55--57\% accuracy due to high noise, non-stationarity, and ...
How Do Graph Signals Affect Recommendation: Unveiling the Mystery of Low and High-Frequency Graph Signals
arXiv:2512.15744v1 Announce Type: new
Abstract: Spectral graph neural networks (GNNs) are highly effective in modeling graph signals, with their success in recommendation often attributed to low-pass filtering. However, recent studies highlight the importance of high-frequency signals. The role of ...
LLaDA2.0: Scaling Up Diffusion Language Models to 100B
arXiv:2512.15745v1 Announce Type: new
Abstract: This paper presents LLaDA2.0 -- a tuple of discrete diffusion large language models (dLLM) scaling up to 100B total parameters through systematic conversion from auto-regressive (AR) models -- establishing a new paradigm for frontier-scale deployment....
BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design
We propose a general-purpose approach for improving the ability of Large Language Models (LLMs) to intelligently and adaptively gather information from a user or other external source using the framework of sequential Bayesian experimental design (BED). This enables LLMs to act as effective multi-tu...