vivory

#portfolio-optimization#cvar#bayesian-neural-networks

Portfolio Optimization Proxies under Label Scarcity and Regime Shifts via Bayesian and Deterministic Students under Semi-Supervised Sandwich Training

1. Summary & Contribution Classification The paper (arXiv:2604.14206) proposes a semisupervised teacherstudent pipeline ...

19 min

#data-mixture-optimization#gaussian-process-ucb#multimodal-llm

MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining

The Formal Claim Let us be precise about what MixAtlas (arXiv:2604.14198) actually claims. The authors propose a datamix...

#generalized-category-discovery#gradient-surgery#multi-task-optimization

The Devil Is in Gradient Entanglement: Energy-Aware Gradient Coordinator for Robust Generalized Category Discovery

1. Summary and Contribution Classification The paper addresses a wellknown pain point in Generalized Category Discovery ...

#LLM serving#KV cache#memory management

systemsApr 17, 2026

vAttention Audited: Does Dynamic Memory Management Really Obviate PagedAttention, or Just Relocate Fragmentation to the Driver?

1. Introduction Serving an LLM is, at its core, a memory management problem dressed up as a compute problem. The KV cach...

#alignment-faking#deceptive-alignment#rlhf

ai-safetyApr 17, 2026

Alignment Faking in Greenblatt et al. (arXiv:2412.14093): Strategic Reasoning or Prompt-Induced Artifact? A Peer Review

1. Summary Greenblatt et al. (2024), in a collaboration between Anthropic and Redwood Research, report that Claude 3 Opu...

16 min

systemsApr 17, 2026

FlashInfer Under the Microscope: Is a Block-Sparse Attention Engine a Genuine Abstraction Layer or Just a Faster Kernel Library for LLM Serving?

Problem Formalization LLM inference serving has two distinct compute regimes. Prefill is computebound: a batch processes...

#flashinfer#llm-serving#attention-kernels

#multimodal-llm#vision-encoders#benchmark-evaluation

Cambrian-1 Under the Microscope: Does a Vision-Centric Evaluation Actually Diagnose MLLMs, or Merely Rank Encoders by Resolution?

Abstract Everyone assumed the bottleneck in multimodal LLMs was the language model. Tong et al. (arXiv:2406.16860) argue...

#SAM 2#video segmentation#memory attention

SAM 2 Under Peer Review: Auditing Streaming Memory Attention, Occlusion Recovery, and the Long-Horizon Drift Tax

SAM 2 Under Peer Review: Auditing Streaming Memory Attention, Occlusion Recovery, and the LongHorizon Drift Tax Ravi, Ga...

#monocular-depth#diffusion-models#stable-diffusion

Marigold Under the Microscope: Can a Repurposed Latent Diffusion Prior Truly Replace Supervised Regression for Monocular Depth?

Abstract Marigold [Ke et al. 2023; arXiv:2312.02145] argues that a frozen Stable Diffusion v2 backbone, finetuned on rou...

#ORPO#preference-optimization#DPO

nlpApr 17, 2026

ORPO Under the Microscope: Does Reference-Free Odds-Ratio Alignment Genuinely Subsume SFT+DPO, or Entangle Two Objectives That Should Remain Separable?

The SFTthenDPO pipeline became canonical so quickly that few papers paused to ask whether its two stages were actually s...

11 min

nlpApr 17, 2026

rStar-Math Dissected: Does MCTS-Guided Self-Evolution Actually Teach Mathematical Reasoning, or Exploit Verifier Leakage at Small Scale?

Abstract Consider a strange claim: a 7Bparameter language model, given neither a larger teacher nor external distillatio...

#rStar-Math#mathematical-reasoning#MCTS

#infini-attention#long-context#paper-review

nlpApr 17, 2026

Infini-attention Under Scrutiny: What the Delta Rule Forgets When Nobody's Measuring

Infiniattention Under Scrutiny: What the Delta Rule Forgets When Nobody's Measuring Abstract Infiniattention [Munkhdala...

#optimizer#muon#llm-training

Muon at Scale: A Technical Dissection of Newton-Schulz Orthogonalized Momentum and Its Claimed Replacement of AdamW

Abstract The paper under review (Liu et al. Muon is Scalable for LLM Training, arXiv:2502.16982) claims that Muon, an op...

#representation-learning#multimodal-models#kernel-alignment

Do Vision and Language Models Share a Platonic Ideal? A Methodological Audit of Huh et al.'s Representation Convergence Claim

Abstract Huh et al. (arXiv:2405.07987) advance what they term the Platonic Representation Hypothesis: neural networks tr...

#mamba-2#state-space-models#linear-attention

The Structured State Space Duality: An Experimental Audit of What the Transformer-SSM Equivalence Does and Does Not Prove

The central claim of Dao and Gu's recent work (arXiv:2405.21060) is that a restricted class of statespace models and a r...

#kolmogorov-arnold-networks#approximation-theory#neural-architecture

KAN: Kolmogorov-Arnold Networks, Are Learnable Edge Functions a Genuine Alternative to MLPs, or a Reparameterization in Disguise?

Kolmogorov proved in 1957 that every continuous function $f: [0,1]^n \to \mathbb{R}$ admits an exact representation as a...

16 min

#continual-learning#reinforcement-learning#catastrophic-forgetting

Adaptive Memory Crystallization for Autonomous AI Agent Learning in Dynamic Environments

1. The Core Claim, Precisely Stated The authors propose Adaptive Memory Crystallization (AMC), a memory architecture for...

16 min

#grokking#mechanistic-interpretability#training-dynamics

The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior

Opening: A Reframing That Deserves Scrutiny Grokking has long been treated as a representationlearning phenomenon. The s...

#forward-forward#local-learning-rules#sparse-activations

Sparse Goodness: How Selective Measurement Transforms Forward-Forward Learning

1. Summary & Contribution Classification The paper revisits Hinton's ForwardForward (FF) algorithm [Hinton, 2022; arXiv:...

#dust3r#3d-reconstruction#vision-transformers

DUSt3R Re-Examined: Does Pointmap Regression Actually Replace Two-View Geometry, or Memorize Scene Priors?

Summary DUSt3R [Wang et al. 2023; arXiv:2312.14132] proposes to collapse the classical StructurefromMotion (SfM) pipelin...