AI – speedyarc.com

NVIDIA AI Brings Nemotron-3-Nano-30B to NVFP4 with Quantization Aware Distillation (QAD) for Efficient Reasoning Inference

February 2, 2026

NVIDIA has released Nemotron-Nano-3-30B-A3B-NVFP4, a production checkpoint that runs a 30B parameter reasoning model in 4 bit NVFP4 format while keeping accuracy close to its BF16 baseline. The model combines a hybrid Mamba2 Transformer Mixture of Experts architecture with a Quantization Aware Distillation (QAD) recipe designed specifically for NVFP4 deployment. Overall, it is an ultra-efficient NVFP4 precision version of Nemotron-3-Nano that delivers up to 4x higher throughput on Blackwell B200. https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 What is Nemotron-Nano-3-30B-A3B-NVFP4? Nemotron-Nano-3-30B-A3B-NVFP4 is a quantized version of Nemotron-3-Nano-30B-A3B-BF16, trained from scratch by NVIDIA team as a unified reasoning and chat model. It is built as a hybrid…

Continue Reading
How to Build Memory-Driven AI Agents with Short-Term, Long-Term, and Episodic Memory

February 2, 2026

def openai_chat(system: str, user: str) -> str: resp = client.chat.completions.create( model=OPENAI_MODEL, messages=[ {"role": "system", "content": system}, {"role": "user", "content": user}, ], temperature=0.3 ) return resp.choices[0].message.content def heuristic_responder(context: str, question: str) -> str: lessons = re.findall(r"Lessons=(.*)", context) avoid = re.findall(r"Avoid=(.*)", context) ltm_lines = [ln for ln in context.splitlines() if ln.startswith("[LTM:")] steps = [] if lessons: for chunk in lessons[:2]: for s in [x.strip() for x in chunk.split(";") if x.strip()]: steps.append(s) for ln in ltm_lines: if "[LTM:procedure]" in ln.lower(): proc = re.sub(r"^\[LTM:procedure\]\s*", "", ln, flags=re.I) proc = proc.split("(salience=")[0].strip() for part in [p.strip() for p in proc.split("|") if p.strip()]: steps.append(part) steps = steps[:8]…

Continue Reading
A Coding and Experimental Analysis of Decentralized Federated Learning with Gossip Protocols and Differential Privacy

February 2, 2026

In this tutorial, we explore how federated learning behaves when the traditional centralized aggregation server is removed and replaced with a fully decentralized, peer-to-peer gossip mechanism. We implement both centralized FedAvg and decentralized Gossip Federated Learning from scratch and introduce client-side differential privacy by injecting calibrated noise into local model updates. By running controlled experiments on non-IID MNIST data, we examine how privacy strength, as measured by different epsilon values, directly affects convergence speed, stability, and final model accuracy. Also, we study the practical trade-offs between privacy guarantees and learning efficiency in real-world decentralized learning systems. Check out the Full Codes…

Continue Reading
Robbyant Open Sources LingBot World: a Real Time World Model for Interactive Simulation and Embodied AI

January 31, 2026

Robbyant, the embodied AI unit inside Ant Group, has open sourced LingBot-World, a large scale world model that turns video generation into an interactive simulator for embodied agents, autonomous driving and games. The system is designed to render controllable environments with high visual fidelity, strong dynamics and long temporal horizons, while staying responsive enough for real time control. From text to video to text to world Most text to video models generate short clips that look realistic but behave like passive movies. They do not model how actions change the environment over time. LingBot-World is built instead as an action…

Continue Reading
AI2 Releases SERA, Soft Verified Coding Agents Built with Supervised Training Only for Practical Repository Level Automation Workflows

January 30, 2026

Allen Institute for AI (AI2) Researchers introduce SERA, Soft Verified Efficient Repository Agents, as a coding agent family that aims to match much larger closed systems using only supervised training and synthetic trajectories. What is SERA? SERA is the first release in AI2’s Open Coding Agents series. The flagship model, SERA-32B, is built on the Qwen 3 32B architecture and is trained as a repository level coding agent. On SWE bench Verified at 32K context, SERA-32B reaches 49.5 percent resolve rate. At 64K context it reaches 54.2 percent. These numbers place it in the same performance band as open weight…

Continue Reading
A Coding Implementation to Training, Optimizing, Evaluating, and Interpreting Knowledge Graph Embeddings with PyKEEN

January 30, 2026

In this tutorial, we walk through an end-to-end, advanced workflow for knowledge graph embeddings using PyKEEN, actively exploring how modern embedding models are trained, evaluated, optimized, and interpreted in practice. We start by understanding the structure of a real knowledge graph dataset, then systematically train and compare multiple embedding models, tune their hyperparameters, and analyze their performance using robust ranking metrics. Also, we focus not just on running pipelines but on building intuition for link prediction, negative sampling, and embedding geometry, ensuring we understand why each step matters and how it affects downstream reasoning over graphs. Check out the FULL CODES…

Continue Reading
Microsoft Unveils Maia 200, An FP4 and FP8 Optimized AI Inference Accelerator for Azure Datacenters

January 30, 2026

Maia 200 is Microsoft’s new in house AI accelerator designed for inference in Azure datacenters. It targets the cost of token generation for large language models and other reasoning workloads by combining narrow precision compute, a dense on chip memory hierarchy and an Ethernet based scale up fabric. Why Microsoft built a dedicated inference chip? Training and inference stress hardware in different ways. Training needs very large all to all communication and long running jobs. Inference cares about tokens per second, latency and tokens per dollar. Microsoft positions Maia 200 as its most efficient inference system, with about 30 percent…

Continue Reading
DeepSeek AI Releases DeepSeek-OCR 2 with Causal Visual Flow Encoder for Layout Aware Document Understanding

January 30, 2026

DeepSeek AI released DeepSeek-OCR 2, an open source document OCR and understanding system that restructures its vision encoder to read pages in a causal order that is closer to how humans scan complex documents. The key component is DeepEncoder V2, a language model style transformer that converts a 2D page into a 1D sequence of visual tokens that already follow a learned reading flow before text decoding starts. https://github.com/deepseek-ai/DeepSeek-OCR-2 From raster order to causal visual flow Most multimodal models still flatten images into a fixed raster sequence, top left to bottom right, and apply a transformer with static positional encodings.…

Continue Reading
A Coding Deep Dive into Differentiable Computer Vision with Kornia Using Geometry Optimization, LoFTR Matching, and GPU Augmentations

January 30, 2026

We implement an advanced, end-to-end Kornia tutorial and demonstrate how modern, differentiable computer vision can be built entirely in PyTorch. We start by constructing GPU-accelerated, synchronized augmentation pipelines for images, masks, and keypoints, then move into differentiable geometry by optimizing a homography directly through gradient descent. We also show how learned feature matching with LoFTR integrates with Kornia’s RANSAC to estimate robust homographies and produce a simple stitched output, even under constrained or offline-safe conditions. Finally, we ground these ideas in practice by training a lightweight CNN on CIFAR-10 using Kornia’s GPU augmentations, highlighting how research-grade vision pipelines translate naturally…

Continue Reading
Ant Group Releases LingBot-VLA, A Vision Language Action Foundation Model For Real World Robot Manipulation

January 30, 2026

How do you build a single vision language action model that can control many different dual arm robots in the real world? LingBot-VLA is Ant Group Robbyant’s new Vision Language Action foundation model that targets practical robot manipulation in the real world. It is trained on about 20,000 hours of teleoperated bimanual data collected from 9 dual arm robot embodiments and is evaluated on the large scale GM-100 benchmark across 3 platforms. The model is designed for cross morphology generalization, data efficient post training, and high training throughput on commodity GPU clusters. https://arxiv.org/pdf/2601.18692 Large scale dual arm dataset across 9 robot…

Continue Reading