OpenDiscord

My Age: 16

OpenDiscord is a public Hugging Face dataset I released containing content extracted from many Discord servers — roughly 322MB of conversation data. It is formatted in ChatML with usernames serving as speaker roles, which lets language models learn natural dialogue structure, turn-taking, and the tone of real multi-party chat. It spans many topics, including a substantial amount of code discussion, giving models basic exposure to common programming-language syntax patterns. OpenDiscord serves as one of the primary training sources for my DAT-Byte model family.