Word Appearance Frequency

Read（1466） Like（0） Label: frequencey, sequence, split, isalpha, lower, group, maxp,

l Problem

In a normal English document, words are separated by blank, comma, full stop, and carriage return, and the sign “-” is used to connect the characters before and after the carriage return into a word.

Now suppose there is such a document according to which you need to get the total number of different words, count the appearance frequency of each word, and select the word with the highest appearance frequency.

l Tip

Load the document, break the document content into a sequence consisting of single characters, and then convert all upper-case letters in the sequence into lower-case letters and change all non-letter characters in the sequence into blanks. Delete the consecutive blanks into one blank, combine members of the sequence into a string, and then according to blanks break the string again into sequences, each of which is composed of one word. Group the same words into one group.The returned sub-group with the largest length contains the word with the highest appearance frequency.

1. Read the document content.

2. Break the document content into a sequence consisting of single characters, and then convert all upper-case letters in the sequence into lower-case letters and change all non-letter characters in the sequence into blanks.

3. Delete consecutive blanks into one blank, combine members of the sequence to form a string, and then according to blanks break the string again into sequences, each of which consists of one word.

4. Group the same words into one group, and the returned sub-group with the largest length contains the word with the highest appearance frequency.

l Code

	A
1	E:\\esProc exercise\\word.txt
2	=file(A1).read()
3	=A2.split().(if(isalpha(~), lower(~)," " ))	Break the document content into sequence consisting of single characters, and then convert all upper-case letters in the sequence into lower-case letters and change all non-letter characters in the sequence into blanks.
4	=A3.select(~!=" " \|\| ~[-1]!=" " )	Delete consecutive blanks into one blank.
5	=A4.concat().split(" ")	Put sequences together to form a string, and then break the string again with blank into sequences, so they form sequences in which one word is a member.
6	=A5.group().maxp(~.len())(1)	Group sequences, query the member with the largest length after grouping, and it is the word with the highest appearance frequency.

l Result