2.2 KiB
2.2 KiB
Chinese Dictionary Implementation
This implementation provides a ChineseDict struct that loads Chinese characters from dict.txt and provides functionality to generate random Chinese characters.
Features
- Load Chinese characters: Reads
dict.txtand extracts all Chinese characters (Unicode range 0x4E00-0x9FFF) - Random character generation: Get single random Chinese characters
- Random string generation: Generate strings of random Chinese characters with specified length
- Character counting: Get the total number of unique Chinese characters loaded
Usage
Basic Usage
// Create a new dictionary instance
dict, err := NewChineseDict("dict.txt")
if err != nil {
log.Fatalf("Error loading dictionary: %v", err)
}
// Get a single random Chinese character
randomChar := dict.GetRandomCharacter()
fmt.Printf("Random character: %c\n", randomChar)
// Get a random string of 5 Chinese characters
randomString := dict.GetRandomString(5)
fmt.Printf("Random string: %s\n", randomString)
// Get the total number of characters in dictionary
count := dict.GetCharacterCount()
fmt.Printf("Total characters: %d\n", count)
Demo
Run the demo to see the functionality in action:
go run . -dict
This will display:
- Total number of Chinese characters loaded
- 10 random single characters
- Random strings of different lengths (3, 5, 8, 10 characters)
Integration with ASR
The dictionary is automatically integrated with the ASR (Automatic Speech Recognition) functionality. When processing speech recognition results, the system will:
- Try to load the dictionary from
dict.txt - Use dictionary characters for more realistic Chinese character replacement
- Fall back to random generation if dictionary loading fails
File Structure
dict.go- Main dictionary implementationdict.txt- Source file containing Chinese charactersasr.go- ASR functionality with dictionary integrationmain.go- Main application with demo functionality
Requirements
- Go 1.16 or later (uses
os.ReadFile) dict.txtfile in the same directory as the executable
Character Statistics
The current dict.txt contains 479,939 Chinese characters, providing a rich source for realistic random character generation.