Large Language Model for Hardware Security
Hardware Phi-1.5B: First hardware domain-specific pretrained LLM in the World
We have conducted pretraining based on the Phi-1.5B model structure, making it more closely aligned with the needs of the hardware domain, enhancing the model's performance and stability in hardware design and verification tasks. It is the first pretrained hardware domain-specific LLM. Accordingly, we created three differently sized datasets rigorously screened and optimized them to guarantee content relevance and quality, thus laying a strong foundation for model training. The pre-trained model is offered openly to the community, thus supporting ongoing research, development, and innovation in both academic and industrial spheres. The releasing date will be around the presentation time of this paper in ASP-DAC 2024.
For testing the fine-tunned model, please reach out to Dr. Xiaolong Guo: guoxiaolong@ksu.edu
More details please refer to our recent accepted paper:
Weimin Fu, Shijie Li, Yifang Zhao, Haocheng Ma, Raj Dutta, Xuan Zhang, Kaicheng Yang, Yier Jin, and Xiaolong Guo. Hardware Phi-1.5B:A Large Language Model Encodes Hardware Domain Specific Knowledge. 29th IEEE/ACM Asia and South Pacific Design Automation Conference (ASP-DAC); 2024 January; Incheon Songdo Convensia, South Korea. [download]
LLM4SecHW
LLM4SecHW is a LLM-based hardware debugging framework designed to address the aforementioned issues. It aims to identify bugs and provide debugging suggestions during the hardware design iteration process. Specifically, we develop an innovative data collection and preprocessing method to harness version control information from open-source hardware projects. From this information, we construct a hardware debugging-oriented dataset by filtering and processing the version control data, which is subsequently utilized to fine-tune our model. Leveraging this dataset, we fine-tune a suite of hardware domain-specific language models capable of reading hardware designs and autonomously locating and rectifying bugs.
Our dataset, LLM4SecHW-OSHD, is now officially available on Huggingface: https://huggingface.co/datasets/KSU-HW-SEC/LLM4SecHW-OSHD
For testing the fine-tunned model, please reach out to Dr. Xiaolong Guo: guoxiaolong@ksu.edu
More details please refer to our recent accepted paper:
Weimin Fu, Kaichen Yang, Raj Gautam Dutta, Xiaolong Guo, and Gang Qu. Llm4sechw: Leavering domain-specific large language model for hardware debugging. Asian Hardware Oriented Security and Trust (AsianHOST), 2023. [download ]
LM4AsrtHW
The analysis and verification of hardware security require the development of robust security properties/assertions, which is a complex and time-consuming process. Crafting hardware security assertions to meet specific requirements is tedious and requires expert knowledge. This paper introduces a framework that leverages Language Models (LMs) to generate hardware security assertions. We present a novel framework LM4AsrtHW for creating a hardware security-centric dataset to fine-tune LMs. These fine-tuned models generate hardware security assertions corresponding to specific CWE vulnerabilities and hardware design. The generated assertions are validated using commercial EDA tools. Our experimental results show our fine-tuned, hardware security-oriented LMs consistently outperform existing automated assertion generators and commercial LLMs in generating hardware security assertions.
Some Preliminary Results of LM4AsrtHW: