10: Thomas Sohmers

Date: 2024-02-28

Duration: 01:22:31

Thomas Sohmers joins to discuss dropping out of high school at age 17 to start a chip company, lessons from the successes and failures of past processor architectures, the history of VLIW, and the new AI hardware appliances he and his team are building at Positron AI.

Thomas on X: https://twitter.com/trsohmers

Thomas’ Site: https://www.trsohmers.com/

Show Notes

Welcome Thomas Sohmers (00:01:22)
Growing Up Around Computers (00:03:13)
Digging Beneath the Software (00:05:56)
Learning Python, C, and Arduino C (00:07:05)
- https://www.arduino.cc/reference/en/
Learning About the Thiel Fellowship (00:07:44)
- https://thielfellowship.org/
Starting Research at MIT at age 14 (00:09:24)
Dropping out of High School and Starting Thiel Fellowship at age 17 (00:10:36)
MIT ISN Lab (00:11:09)
- https://isn.mit.edu/
Evaluating ARM Processors for High Performance Computing (00:11:28)
- https://en.wikipedia.org/wiki/ARM_architecture_family
ARM Calxeda Processor (00:11:38)
- https://en.wikipedia.org/wiki/Calxeda
- https://www.zdnet.com/article/what-the-death-of-calxeda-means-for-the-future-of-microservers/
Scaling Out Low Power Processors for Data Center Compute (00:12:27)
Incorporating REX Computing (00:13:42)
- http://rexcomputing.com/
- https://fortune.com/2015/07/21/rex-computing/
Facebook and the Open Compute Project (00:14:18)
- https://www.opencompute.org/
Deciding Against Arm (00:14:49)
ARMv8 (00:15:12)
- https://en.wikichip.org/wiki/arm/armv8
Deciding to Design a New Architecture (00:16:26)
Multiflow (00:18:23)
- https://en.wikipedia.org/wiki/Multiflow
Good Architecture Ideas from the Past (00:18:35)
Thomas’ Talk at Stanford (00:18:59)
- https://youtu.be/ki6jVXZM2XU
RISC vs. CISC Debate (00:19:37)
- https://cs.stanford.edu/people/eroberts/courses/soco/projects/risc/risccisc/
SPARC Instruction Set (00:20:04)
- https://en.wikipedia.org/wiki/SPARC
The Importance of History (00:20:58)
RISC Came Before CISC (00:23:08)
CDC 6600 (00:23:20)
- https://en.wikipedia.org/wiki/CDC_6600
Load-Store Architecture (00:23:53)
- https://en.wikipedia.org/wiki/Load%E2%80%93store_architecture
IBM System/360 (00:24:02)
- https://en.wikipedia.org/wiki/IBM_System/360
PowerPC (00:24:29)
- https://en.wikipedia.org/wiki/PowerPC
VLIW (00:25:02)
- https://en.wikipedia.org/wiki/Very_long_instruction_word
ELI-512 and Josh Fisher (00:25:05)
- https://dl.acm.org/doi/pdf/10.1145/800046.801649
- https://en.wikipedia.org/wiki/Josh_Fisher
Floating Point Systems, Inc. (FPS) (00:26:45)
- https://en.wikipedia.org/wiki/Floating_Point_Systems
Multiflow Compiler (00:26:52)
- https://www.cs.yale.edu/publications/techreports/tr364.pdf
Instruction Level Parallelism (00:27:33)
- https://en.wikipedia.org/wiki/Instruction-level_parallelism
Intel Itanium (00:28:20)
- https://en.wikipedia.org/wiki/Itanium
Itanium is not a VLIW Architecture (00:29:04)
Explicitly Parallel Instruction Computer (EPIC) (00:29:22)
- https://en.wikipedia.org/wiki/Explicitly_parallel_instruction_computing
x86 and Pentium (00:30:18)
- https://en.wikipedia.org/wiki/X86
- https://en.wikipedia.org/wiki/Pentium
Impact of Branch Prediction and Caching on Determinism (00:31:34)
- https://en.wikipedia.org/wiki/Branch_predictor
- https://en.wikipedia.org/wiki/CPU_cache
Why Itanium Failed (00:32:27)
REX’s NEO Architecture (00:35:29)
- http://rexcomputing.com/#neoarch
Hard Real-Time Determinism (00:35:41)
Scratchpad Memory (00:35:54)
- https://en.wikipedia.org/wiki/Scratchpad_memory
Removing Memory Management (TLB, MMU, etc.) (00:36:18)
- https://en.wikipedia.org/wiki/Translation_lookaside_buffer
- https://en.wikipedia.org/wiki/Memory_management_unit
ALU, FPU, and Register Files (00:37:14)
- https://en.wikipedia.org/wiki/Arithmetic_logic_unit
- https://en.wikipedia.org/wiki/Floating-point_unit
- https://en.wikipedia.org/wiki/Register_file
Benefits of Removing Implicit Caching Layers (00:38:30)
VLIW in Signal Processing (00:39:51)
- https://en.wikipedia.org/wiki/Digital_signal_processor
VLIW Won in a Silent Way (00:40:49)
Original Reason for Hardware-Managed Caching (00:41:26)
Impact of VLIW and Software-Managed Memory on Compile Times (00:42:41)
- http://www.ai.mit.edu/projects/aries/Documents/vliw.pdf
LLVM and Sufficiently Advanced Open Source Compilers (00:42:49)
- https://llvm.org/
Apple Transition from PowerPC to x86 to Arm (00:43:31)
- https://en.wikipedia.org/wiki/Mac_transition_to_Intel_processors
- https://en.wikipedia.org/wiki/Mac_transition_to_Apple_silicon
Static Single-Assignment Form (00:44:11)
- https://en.wikipedia.org/wiki/Static_single-assignment_form
Impact of More Powerful Personal Machines on VLIW (00:45:07)
Software is the Hard Part of New Hardware (00:45:35)
LLVM Frontends, IR, and Backends (00:46:20)
- https://llvm.org/pubs/2004-01-30-CGO-LLVM.html
Qualcomm Hexagon DSP (00:47:22)
- https://en.wikipedia.org/wiki/Qualcomm_Hexagon
Paul Sebexen (00:48:08)
- https://www.linkedin.com/in/paul-sebexen-59204625/
Basic Linear Algebra Subprograms (BLAS) (00:49:21)
- https://www.netlib.org/blas/
Fast Fourier Transform (FFT) (00:49:33)
- https://en.wikipedia.org/wiki/Fast_Fourier_transform
Working on Software in Parallel with Hardware (00:50:00)
Verilator (00:51:09)
- https://www.veripool.org/verilator/
Cadence Incisive (00:51:18)
- https://en.wikipedia.org/wiki/NCSim
Synthesizing RTL to Netlist Every Day (00:52:07)
- https://en.wikipedia.org/wiki/Logic_synthesis
FPGA vs. ASIC Design Flow (00:53:57)
Open Source Synthesis Tools (00:54:49)
- https://f4pga.org/
OpenMPW (00:57:01)
- https://efabless.com/open_shuttle_program
FGPA Simulation Post Tape-out (00:57:25)
Xilinx Ultrascale (00:57:48)
- https://www.xilinx.com/products/technology/ultrascale.html
What Happened to the NEO Chips (00:59:02)
Floating Point Performance Per Watt vs. Nvidia A100 (01:00:11)
- https://www.nvidia.com/en-us/data-center/a100/
Winding Down REX (01:00:51)
Positron AI (01:05:11)
- https://www.positron.ai/
NeurIPS Exhibit (01:05:48)
- https://x.com/trsohmers/status/1734975623016088000
5x Performance Per Dollar over Nvidia H100 (01:06:07)
- https://www.nvidia.com/en-us/data-center/h100/
Balancing Memory Bandwidth and Compute (01:06:48)
Software Interface for Inference Accelerator (01:08:15)
Trained Model File Formats (01:09:13)
- http://onnx.ai/
- https://huggingface.co/docs/safetensors/en/index
Benefits of Direct Ingestion of Model FIles (01:09:33)
The Importance of Understanding Customer Pain Points (01:12:33)
Current Pain Points for Inference (01:14:12)
- https://www.run.ai/guides/machine-learning-inference/understanding-machine-learning-inference
Performance Per Dollar (01:15:16)
Load Balancing Requests in a Hardware Agnostic Manner (01:16:20)
OpenAI API (Informal) Specification (01:16:47)
- https://platform.openai.com/docs/api-reference/audio/createTranscription
Future Proofing Hardware (01:17:31)
State Space Models (Hyena, Mamba) (01:18:38)
- https://huggingface.co/blog/lbourdois/get-on-the-ssm-train
- https://arxiv.org/abs/2302.10866
- https://github.com/state-spaces/mamba
- https://arxiv.org/abs/2312.00752
Mixture of Experts (MoE) (GPT-4, Gemini, Mixtral) (01:19:33)
- https://en.wikipedia.org/wiki/Mixture_of_experts
- https://openai.com/research/gpt-4
- https://blog.google/technology/ai/google-gemini-ai/
- https://mistral.ai/news/mixtral-of-experts/
High Bandwidth Memory (HBM) (01:20:32)
- https://en.wikipedia.org/wiki/High_Bandwidth_Memory
Chip on Wafer on Substrate (CoWos) (01:20:49)
- https://www.tsmc.com/english/dedicatedFoundry/technology/cowos
Keeping up with Thomas and Positron (01:21:34)
- https://twitter.com/trsohmers
- https://www.linkedin.com/in/trsohmers/
- https://twitter.com/Positron_AI
- https://www.linkedin.com/company/positron-ai/

Transcript

Coming soon.