How it works?

Process of generating the taxonomy from a raw text is presented by the following steps:

  1. Reduce the inflected or derived words to their root form. After this step the input raw text will be called a normalized text. Example:

    Nike Women’s shirt red . - -> Nike Women shirt red.

  2. Mark every word in the raw string with the corresponding part of speech:

    Nike- generic name, Women’s - adjective, shirt – substantive, red – adjective

  3. Match the marked words from normalized text with the ones from the user predefined taxonomy.

Predefined taxonomy

Predefined taxonomy is just a collection of excel columns. Each column have to have a header which describes properties of values inside its rows. Syntax is as follow:

*<column_name>:<part_of_speech>

* - asterisk at the beginning of the column definition informs algorithm to propose some value from input in case when none of the rows from that column can be matched.

<column_name> - name of a column.

<part_of_speech> - part of speech tag which filters words from normalized text according to its part of speech type. Only filtered words will be used in matching process.


Possible values for <part_of_speech> are: