0%

Survey of VLSI LLM

Category Base Model Data Open-source Training cost Problem Metric Experimental Conclusions Other Remarks
DATE23-BPAN, Benchmarking LLM for automated Verilog RTL Code Generation Fine-tuning CodeGen, 345M - 16B Github: 50K files / ~ 300MB: Verilog Book: 100MB Yes CodeGen 2B: 1 epoch, two RTX8000(48GB), 2days; CodeGen 6B: 1 epoch, 4 RYX8000(48GB), 4 days; CodeGen 16B: 1 epoch, 3 A100, 6 days. Code generation from HDLBits website 1. Compiled completions; 2. functional pass 1. Fine-tuning increase compiling completion rate significantly; (with 10 different completions) 2. Fine-tuning is still bad at functionality correctness of intermediate and advanced problems LLM is only good at small scale/ light-weight task
ChipNemo Fine-tuning LLaMA2, 7B, 13B, 70B Internal Data (Bug summary, Design source, Documentation, verification, other): ~22B tokens; Wiki: 1.5 B tokens [natural language]; Github: 0.7 B tokens, C++, Python, Verilog [code]. No 7B: 2710 A100 hours; 13B: 5100 A100 hours Script Generation; Chatbot (88 practical questions in arch/design/verification); Bug summary and analysis Mostly human rating A lager lr. (3x10e-4 vs5x10e-6) degrades performance significantly In most cases, 70B w.o. FT is better than 13B w. FT
ChatEDA Fine-Tuning LLaMA2, 70B In-context learning (give example in prompt) gives 1500 instructions + proofreading. No 15 epochs, 8xA100 80GB, task planning; Script generation 1. the task planning is accurate? 2. the script generation is accurate? Auto-gressive objective?
RTL-Coder Fine-Tuning Mistra-7B-v0.1 27k+ samples (pair of problem + RTL code ) Yes 4 RTX4090 RTL code generation VerilogEval + RTLLM The new training scheme seems to be very effective; Using generation method in RTLLM, the function correctness even reaches 60% for GPT4