Module tokenization
This module allows you to tokenize dictionaries for better results.
Functions
def build_2D_substitute_matrix(dictionary, alphabet, substitute_dict)-
build_2D_substitute_matrix()initiate and fill a 2 dimension matrix (dict of dict object) by browsing the dictionary.- dictionary (list): the input dictionary (after processing)
- alphabet (list): the used alphabet (from input file or from dictionary)
- substitute_dict (dict): the substituted characters indexed by single substitution character
- return (dict): the matrix representing the probability of letter chaining each other
def check_tokenizable(dictionary)-
check_tokenizable()checks if the dictionary contains any word with a digit or an uppercase character.- dictionary (list): the input dictionary (after processing)
- return (bool) False if any digit or uppercase character, True otherwise
def find_max(matrix, alphabet)-
find_max()finds the most frequent character sequence.- matrix (dict): the matrix representing the probability of letter chaining each other
- alphabet (list): the used alphabet (from input file or from dictionary)
- return (tuple): the most frequent consecutive character sequence
def plot_2D_matrix(matrix, alphabet, filename)-
plot_2D_matrix()plot the matrix in a diagram using matplotlib.- matrix (dict): the matrix representing the probability of letter chaining each other
- alphabet (list): the used alphabet (from input file or from dictionary)
- filename (str): the name of the file to plot in
- return (None)
def print_2D_matrix(matrix, alphabet)-
print_2D_matrix()print the matrix row by row.s- matrix (dict): the matrix representing the probability of letter chaining each other
- alphabet (list): the used alphabet (from input file or from dictionary)
- return (None)
def reverse_substitution(word, substitute_dict)-
reverse_substitution()decode a word from substitute to human readable.- word (str): the word to decode back
- substitute_dict (dict): the substituted characters indexed by single substitution character
- return (str): the decoded word
- word (str): the word to decode back
def write_substitute_dictionary(dictionary, substitute_dict, filename)-
write_substitute_dictionary()writes the dictionary in a file with substitutions.- dictionary (list): the input dictionary (after processing)
- substitute_dict (dict): the substituted characters indexed by single substitution character
- filename (str): the name of the file to open (
writemode) - return (None)
- dictionary (list): the input dictionary (after processing)