|
Post by Hrafn Loftsson on Oct 12, 2010 4:48:10 GMT -5
Here you can discuss Assignment II.
|
|
|
Post by Carmine on Oct 13, 2010 8:26:59 GMT -5
how we can compute the wfreq? for example, this is the file input: 70 a tagA 20 a tagAA 10 b tagB
the wfreq for a shoud be: - 2 (the number of tags) - 90 (the sum of the freq of its tags) - 90/100 (the percentuage)
which is the correct answer?
|
|
|
Post by Hrafn Loftsson on Oct 14, 2010 4:37:20 GMT -5
The correct is 90, the sum of the frequencies of tags for the word.
|
|
|
Post by danilotrix on Oct 20, 2010 10:00:55 GMT -5
in the 3.2 problem, what about if a word is a unknown number?
|
|
|
Post by Hrafn Loftsson on Oct 20, 2010 10:59:59 GMT -5
in the 3.2 problem, what about if a word is a unknown number? Good question! The logical thing is then to assign to the unknown token the tag that is used for numbers, i.e. "CD".
|
|
|
Post by danilotrix on Oct 20, 2010 11:25:27 GMT -5
the same problem if the word begins with . or - or & or $
or if the word starts with a number and then letters: 092fgassda
|
|
|
Post by Hrafn Loftsson on Oct 20, 2010 13:17:21 GMT -5
If it starts with a number, use "CD". If it starts with something else than alphabetical letter, just use NN.
|
|