Posts by Collection

notes-en

notes-zh

DeepSeek 预训练数据笔记

Published:

这篇笔记总结 DeepSeek 在数据侧的核心做法:去重 -> 过滤 -> 重混,以及 tokenizer 与代码数据工程。

portfolio

publications

Paper Title Number 4

Published in GitHub Journal of Bugs, 2024

This paper is about fixing template issue #693.

Recommended citation: Your Name, You. (2024). "Paper Title Number 3." GitHub Journal of Bugs. 1(3).
Download Paper

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.