Thank you for choosing AskDataAnything!™. Below you will find basic information about this product.
Ask Data Anything is Cognitum's approach to explore data using a subset of natural language which articulates concepts and instances modeled in ontologies to provide a meaningful querying experience. Ask Data Anything seizes on regularities of language to provide a natural interpretation of queries being asked; its semantics are provided via F# and Ontorion.
ADA requires several components to be installed.
ADA requires Ontorion Server to be installed. In order to find out how to install Ontorion please click here.
To start installation process double click on installation file. A window should appear.
A new window will appear, containing License Agreement. Please, read the agreement carefully and check box "I accept the terms in the License Agreement". Then click Next.
In a new window you can choose which features you would like to install. It is recommended not to change those settings, unless you are certain about it. After making changes (or not) click Next.
To begin installation process click Install. Please note, that it may require administrator's privileges.
After successful installation the last window will appear. Click Finish to close it.
Running ADA consists of four steps:
When running for the first time, ADA may be blocked by Windows Firewall. A new prompt window should appear. It is necessary to click Allow in order to make ADA working correctly.
After the last step a web browser should start and a login page should appear.
Sign in with your ADA account name and password and start using the web application.
ADA allows you to configure many parameters. The main thing to set is the ADA Semantic Dataset which will be used at application start. Now ADA Semantic Datasets can consist of database with ontology and many tabular sources in csv and Oracle SQL formats with relations defined in ontology.
Find configuration file AskDataAnything.exe.config in bin folder in directory where all the ADA files are located.
ADA configurable parameters:
It is also possible to predefine different ADA Semantic Datasets. Every ADA Semantic Dataset consists of one or more ADA Dataset of .csv or Oracle SQL type and ontology loaded into Ontorion.
Every AdaSemanticDataset has fields:
Every AdaDataSet has fields:
ADA has very simple and intuitive interface. The most important part is the input bar where user types queries. Each query is a command for ADA to execute. The result of this action is displayed below the query bar. On the right of the query bar there is a microphone icon, that enables speech recognition (experimental feature, available only on Google Chrome). Over the bar there is "What's inside?" link. Clicking that link reveals information about data in dataset, possible actions to perform and outputs to visualize the result.
Colors used to visualize datasets need short discussion. Purple is for every view declared in the ontology. All other colors are typical dimensions and their subcategories. Dark red is a location, red is a date/time. Green means numerical value. Blue are strings. Elements in same line are synonyms. In the ontology can be described subcategories of some dimension. They are listed under their parent dimension and highlighted in lighter color.
Next to the link described just above are located "Show Examples" and "Stored Queries". Click "Show Examples" to try well-formed queries. In case of using stored queries you have to have any queries saved. To do it click disk icon on the right side in the query bar. At the top of the web-page there is a gray bar that consists of five things:
On the top there is a visualization bar menu shown with query's results. It allows to quickly change data visualization. Some of them have additional options. Click on icon with a gear to open them. See output examples in Outputs and short instruction with sample for visualization bar. The final elements of user's interface are color rectangles that contain additional information about understood query, available columns for column constraints and errors.
To deploy ADA you have to do several things
You can deploy ADA in IIS differently and have problems typical for the variant. Follow this note to avoid them.
If you cannot see 'Deploy' menu, check if all the required (especially Web Deployment Tool and Web Deploy) are installed.
If You use 64-bit operating system,then enable a 32-bit Applications Pool. To change these parameter, follow the below steps:
Configuration Error
Description: An error occurred during the processing of a configuration file required to service this request. Please review the specific error details below and modify your configuration file appropriately.
Parser Error Message: It is an error to use a section registered as allowDefinition='MachineToApplication' beyond application level. This error can be caused by a virtual directory not being configured as an application in IIS.
Source Error
Line 26: <customErrors mode="Off" /> Line 27: <compilation targetFramework="4.0" /> Line 28: <authentication mode="Forms"> Line 29: <forms loginUrl="~/Account/Login" protection="All" timeout="25" /> Line 30: </authentication>
If there are any issues caused by that ssl is not enabled, follow instructions in this guide , until you reach ‘Configure WCF Service for HTTP Transport Security’ (don’t do that part).
ADA Semantic Dataset is the main abstract structure defining the data, metadate and all relations. Every ADA Semantic Dataset consist of knowledge base with the ontology (the metadata) and ADA Dataset which is the set of different tabular data in both csv or SQL formats.
In ADA every dataset consist of two things - taxonomy uploaded to the database which contains all metadata and set of bare data in csv or Oracle SQL. Ontology allows to create new dimensions and semantically connect elements of the data. It enables asking more natural and more complex queries to get the best results in the shortest possible time.
To easily write the taxonomy (ontology) use the Fluent Editor - a very intuitive and powerful tool created by Cognitum using English version of Controlled Natural Language. To get more information about Fluent Editor and other semantics technologies, please visit Cognitum's website www.cognitum.eu.
It is possible to use annotations in place of cnl names during querying. Use label[rdfs] to apply easily readable name.
With the label it is possible to query using only the name from the annotation. ADA allows to define names in different languages by specifying the language of annotations. The name in currently specified language is used during querying.
To follow taxonomy creation let's take sample data as in the 3 tables shown below.
Transaction Table
Transaction-Id | Product-Id | Transaction Price | Shop-Id | Size | Quantity | Warehouse Id | Value | Date | Fulfillment Date | Margin |
1ABB | 1 | 297.29 | 1 | 35 | 35 | 1 | 6282.36 | 09/12/2014 | 09/12/2014 | 0.01 |
3ABB | 2 | 307.83 | 2 | 35 | 15 | 2 | 6302.4 | 5/25/2015 | 5/25/2015 | 0.2 |
5ABB | 2 | 170.4 | 3 | 31 | 15 | 12 | 6309.4 | 09/10/2015 | 9/15/2015 | 0.11 |
5ACC | 3 | 119.79 | 4 | 38 | 28 | 4 | 7863.18 | 03/09/2014 | 03/09/2014 | 0.36 |
8AAA | 3 | 118.3 | 5 | 38 | 47 | 9 | 6079.48 | 3/14/2015 | 3/14/2015 | 0.26 |
Locations Table
Shop Id | Warehouse Id | City | Street | Name |
1 | 1 | Boston | 70 Rowes Wharf | Warehouse in Boston |
2 | 2 | Alicante | Calle México, 18 | Warehouse in Alicante |
3 | 3 | Milan | Via Privata Fratelli Gabba, 7b | Warehouse in Milan |
4 | 4 | Berlin | Silici, 1 | Warehouse in Berlin |
5 | 5 | Montpelier | 515 Rue de l'Industrie | Warehouse in Montpelier |
Products Table
Product Id | Product | Unit-Price | Product-Size | Trademark | Trademarkshort |
1 | Sport shoes | 22 | 35 | Nike | Nike |
2 | Leather shoes | 12 | 35 | Joe Fresh | Joe |
3 | High heel shoes | 44 | 31 | Zara | Zara |
To create the taxonomy it's needed to use references to Base.encnl ontology.
It defines all data types in ADA and their basic relation. The primary type is ada-dimension. Every more specific dimension is subconcept (child) of the ada-dimension. Some relations and concepts are integrated with the RDF Data Cube and R2RML.
Ontology can be divided into many .encnl documents connected into one by references. It is possible to keep locations, concepts, data structure etc. in their own files. It is valuable and recommended practice which keeps ontology clean and easy to manage.
The first step in writing the taxonomy is describing columns, tables and ADA Datasets.
Sales-Transaction-Dataset has-dataset-structure[ADABase] Sales-Dataset-Structure[SalesDesc]. Transaction-Table is a ada-logical-table[ADABase] . Products-Table is a ada-logical-table[ADABase] . Locations-Table is a ada-logical-table[ADABase] . Sales-Dataset-Structure[SalesDesc] has-data-source[ADABase] Transaction-Table . Transaction-Table has-name[ADABase] equal-to 'SalesTransactions_Transactions'. Sales-Dataset-Structure[SalesDesc] has-data-source[ADABase] Products-Table . Products-Table has-name[ADABase] equal-to 'SalesTransactions_Products'. Sales-Dataset-Structure[SalesDesc] has-data-source[ADABase] Locations-Table . Locations-Table has-name[ADABase] equal-to 'SalesTransactions_Locations'.
This is a good example of using more then one file containing the taxonomy. SalesDesc is the ontology containg additional data with concept, filters and names for columns. It is the easiest way to widen structure metadata. Just to create and attach an ada logical table it is possible to omit the first sentence from above.
While the logical table is existing, it's possible to add columns.
Transaction-Table has-dimension[ADABase] Size-Col. Size-Col has-name[ADABase] equal-to 'Size'. Size-Col has-equivalent-name[ADABase] equal-to 'Product-caliber'. Size-Col is a size-filter[SalesDesc] . Transaction-Table has-dimension[ADABase] Quantity. Quantity has-name[ADABase] equal-to 'Quantity'. Quantity has-equivalent-name[ADABase] equal-to 'Quant'. Quantity has-equivalent-name[ADABase] equal-to 'Volume'. Quantity is a large-quantity[SalesDesc]. Quantity is a quantity-filter[SalesDesc] .
Firstly, it's needed to add instance of the column to the instance of the logical table. Next is setting the name of the column that is used in the data. To do this use has-column-name relation. Another option is to use an annotation defined in ADA configuration to declare the name of the column in the dataset. However the columns are added and you can simply stay at this moment, consider extending the taxonomy. When the columns has complicated names or can be named in different ways add synonyms by has-equivalent-name. You can also widen ontology using concepts and filters.
The last thing is adding the view - a dimension that is defined in the ontology and represents data from multiple columns. To do this declare instance of row and connect it with columns using has-view-column. Note that it is possible and can be useful to add more then one view in the taxonomy.
Transaction is a view[ADABase]. Transaction has-view-column[ADABase] Product. Transaction has-view-column[ADABase] Product-Trademark. Transaction has-view-column[ADABase] Transaction-Id-Col. Transaction has-view-column[ADABase] Trademarkshort. Transaction has-view-column[ADABase] City. Transaction has-view-column[ADABase] Fulfillment-Center. Transaction has-view-column[ADABase] Product-Price. Transaction has-view-column[ADABase] Size-Col. Transaction has-view-column[ADABase] Quantity. Transaction has-view-column[ADABase] Transaction-Date. Transaction has-view-column[ADABase] Transaction-Value. Transaction has-view-column[ADABase] Fulfillment-Date. Transaction has-view-column[ADABase] Tax.
In ADA it is possible to add a hierarchy of abstract concepts to describe the data in detail. To do this use hierarchical which shows relations between concepts and instances defining the relation. Hierarchicals are connected with each others by is relation. The hierarchy can be described from the top of abstract ada-dimension down to the data strictly written.
Every brand is a hierarchical[ADABase]. Every american-brand is a brand. Every spanish-brand is a brand. Zara is a spanish-brand. Every good is a hierarchical[ADABase]. Every clothing is a good. Every shoe is a clothing. Every leather-shoe is a shoe. The-"Leather shoes" is a leather-shoe.
Using annotations it is possible to give names for concepts in different languages defined in ADA configuration. This annotation can be used during askig. I.e. spanish-brand may have annotations:
Location shows relations between concepts and instances about a location. It is possible to build "a hierarchy" of locations. Note there is a big difference comparing location to hierarchical. Locations are not connected by is relation but by is-located-in and contains. It is useful to describe this part of ontology in the different file and to attach it to the main ontology by reference. To add locations declare proper concepts and instances in the ontology and connect them carefully. To do it well analyze this short example.
Every continent is a location[ADABase]. Every country is a location[ADABase]. Every city is a location[ADABase]. Europe is a continent. North-America is a continent. Africa is a continent. Poland is a country. Germany is a country. Krakow is a city. Warsaw is a city. Hamburg is a city. Berlin is a city. Warsaw is-located-in[ADABase] Poland. Krakow is-located-in[ADABase] Poland. Hamburg is-located-in[ADABase] Germany. Berlin is-located-in[ADABase] Germany. Germany is-located-in[ADABase] Europe. Poland is-located-in[ADABase] Europe.
Instances can use annotations in different languages. This may be used in asking. I.e. Warsaw has annotations:
ADA allows adding filters to the ontology. We can say they are a simple kind of semantic tagging. 3 types of filters are distinguished: hierarchical, numerical and regex.
Transaction-Table has-dimension[ADABase] Quantity. Quantity has-name[ADABase] equal-to 'Quantity'. Quantity has-equivalent-name[ADABase] equal-to 'Quant'. Quantity has-equivalent-name[ADABase] equal-to 'Volume'. Quantity is a large-quantity[SalesDesc]. Quantity is a quantity-filter[SalesDesc] . Every quantity-filter is a hierarchical-filter[ADABase]. Every almost-thirty-five is a quantity-filter. Every thirty-four is a almost-thirty-five. Every thirty-five is a almost-thirty-five. Every thirty-six is a almost-thirty-five.
Here you can seen two concepts that are the filters: large-quantity and quantity-filter with its subconcept thirty-five.
To add filtering functionalities to the concepts you have to add annotations. In Fluent Editor click on the concept and choose Annotations from the panel in the right.
Let's add two filters declared for one concept - numerical and regex. Numerical is working for numbers. It filters all values of the column (Quantity) that fulfills some numbers comparison (> 46). Regex (regular expression) is working for strings. It passes values that match the expression.
Define now hierarchical filter. The advantage of such a filter is hierarchy that can be used also in queries. We are going from the top of the less to the bottom of the most specific filters. In this example the top is quantity filter. Its subconcept is almost-thirty-five which has thirty-four, thirty-five and thirty-six. Asking about thirty-five gives results only for this filter, but almost-thirty-five returns its own nd of all its subconcepts results. It's not hard to imagine that quantity-filter has also almost-forty and other filters so that quantity-filter returns filtered values of all filters in the hierarchy. Described mechanism is still working. You can build hierarchy as deep and complicated as you want. Hierarchical filters are only regex filters.
Adding the label allows ADA to show the filter in understood query box.
To use the synonyms in ADA application you must define the data source and the alternative ways in which we can call the columns. In SalesTransactions_Multitable.encnl file you can see an example how to use it:
Sales-Transactions-Table has-dimension[ADABase] Quantity. Quantity has-name[ADABase] equal-to 'Quantity'. Quantity has-equivalent-name[ADABase] equal-to 'Quant'. Quantity has-equivalent-name[ADABase] equal-to 'Volume'.
View is a dimension that is defined in the ontology and represents data from multiple dimensions in a single view. To use the view in ADA application you must define the "view" and related dimensions. You can add to SalesTransactions_Multitable.encnl file:
Transaction is a view[ADABase]. Transaction has-view-column[ADABase] Product. Transaction has-view-column[ADABase] Product-Trademark. Transaction has-view-column[ADABase] Transaction-Id-Col. Transaction has-view-column[ADABase] Trademarkshort. Transaction has-view-column[ADABase] City. Transaction has-view-column[ADABase] Fulfillment-Center. Transaction has-view-column[ADABase] Product-Price. Transaction has-view-column[ADABase] Size-Col. Transaction has-view-column[ADABase] Quantity. Transaction has-view-column[ADABase] Transaction-Date. Transaction has-view-column[ADABase] Transaction-Value. Transaction has-view-column[ADABase] Fulfillment-Date. Transaction has-view-column[ADABase] Tax.
Remember to update the ontology in database "SalesTransactions_Multitable". To do it you can use command import, as in section Loading ontology or export the ontology to the database using Fluent Editor.
EXAMPLE: Transaction will return all results from columns: Transaction-Id, Trademarkshort, City, Fulfillment-Center, Price, Size, Quantity, Date, Value, Fulfillment Date.Create hierarchical regex filter which allows you to ask about values filtered by different stages of hierarchy.
Quantity is a quantity-filter. Every quantity-filter is a hierarchical-filter[ADABase]. Every almost-thirty-five is a quantity-filter. Every thirty-four is a almost-thirty-five. Every thirty-five is a almost-thirty-five. Every thirty-six is a almost-thirty-five. Size is a size-filter. Every size-filter is a hierarchical-filter[ADABase]. Every thirty-five is a size-filter.
Follow this three examples to learn hierarchical filters functionalities.
EXAMPLE: Product in "thirty five" will return all products with Size and Quantity matching regex "thirty five".Tabular data from different sources may be in some relations. Ontology gives possibility to describe it. In this example column Shop-Id from Transaction-Table is the same as column Shop Id in Locations-Table. It is defined by relates-to relation. To understad it better take a view at the sample tables with data.
Transaction-Table is a ada-logical-table[ADABase] . Locations-Table is a ada-logical-table[ADABase] . Transaction-Table has-dimension[ADABase] Transaction-Shop. Transaction-Shop has-name[ADABase] equal-to 'Shop-Id'. Locations-Table has-dimension[ADABase] Shop. Shop has-name[ADABase] equal-to 'Shop Id'. Shop relates-to[ADABase] Transaction-Shop.
The main element of the ADA web-application is an entry where a user provides queries to get the desired result. ADA queries consist of different elements, all of them are explained in this chapter. Note also that ADA allows to query not grammatically strictly. The understood query is shown in the box just below the queries input. But still the best practice is to keep the proper syntax.
The best practice to get expected results and learn how to easily ask ADA is to keep the proper syntax.. Main schema is shown below.
The core of every query is a dimension. This is the being in data about which we are asking. It is possible to write more then 1 dimension like Product and Transaction and Value, but be careful. The first word in query may be an operation, which works only for the first dimension declared in the query. The next part can be subsetting of data represented by the dimension. By this it is possible to gather needed information filtering unwanted results. The forth part is aggregation which allows to group data in subsets. In the end it is possible to inform ADA what type of visualization is expected.
Operation is the action we can perform on data to get desired information. Currently, there are 3 available operations in ADA:
Every operation requires a Dimension to act on. Only one operation is allowed in a single query.
Dimension is a general name for every subset of data that a user asks for or performs actions on it. There are two types of dimensions:
User can use dimensions to get the data their represent by simply providing their names.
The first option is getting results with some dimension in a relation with some data. Both relation and value is typed by the type of the declared dimension. For numerical data relations are numerical and sting comparison operations. For date relations - date comparison. For others only string comparison is available. The value's type is compatible with dimension and relation.
It is also possible to use "in" constraints. An element may be anything declared in ontology (i.e. location "continent", class of abstraction like "american-brand") or content of some column.
If the element is content of column that appears in more then one column, it can be useful to directly point the column by its name in rounded brackets.
ADA also gives opportunity to subset data by date. Four last options of the diagram should be clear. The most powerful are date declaration options. You can write:
Aggregator groups results by some criteria. The obligatory key-word is "by". After that it's possible to pass dimension, location (i.e. continent, country), time period (year, month, day, date).
For this 3 aggregations you can add further subsetting using "in" like "Transaction in Warsaw", "country in Europe", "day in June".
It is possible to aggregate basing on ontology not only by locations and time-periods. In fact, aggregation works perfectly for any element declared in the ontology.
Very useful is aggregation by 2 dimensions. Using count operation and piechart it is easy to show summaries. Use it like Product by Price and by Product .
When the element used for aggregation is connected with 2 or more columns then it is posibble to ask about the column explicite. I.e. Product by brand (Producer) .
See also aggregation examples in Aggregator.
Now ADA support more intuitive queries with where and in keywords.
Where with a numerical value always gives logical and as a result.
In gives logical or for results from same column and logical and for different columns.
For now ADA provides 7 types of output. Write in the end of query "on" and:
This chapter shows only simplified schema of query. You can build more complex queries by changing subsetting and aggregations on their places, writing more that 1 subsetting and aggregation constraints. Remember only to write the operation first, dimension the second and output at the end of the query. It is also a good practice not to write too complex queries.
Constraints are the methods to use only the subset of data represented by certain dimension. Thanks to them, it is possible to specify only few dimensions and be able to use whatever subset of data user wants. There are three main types of constraints:
Value constraints - define a subset of dimension's data, by providing a mathematical constraint to corresponding values of numerical column (which does not have to be the same as used dimension)
EXAMPLE: Product with Price > 149.99 will return names of products that have price higher than 149.99Date/Time constraints - define a subset of dimension's data, by providing a constraint to corresponding values of temporal column
EXAMPLE: Transaction from May to September will return transaction in provided time periodin constraints - define a subset of dimension's data, by taking elements that are related to provided name from data or ontology
EXAMPLE: Average price in "Sport shoes" in Warsaw will return the average price of product "Sport shoes" that was sold in WarsawColumn constraints are a special type of constraints, that are used together with in constraint. They are useful, when a word used as a constrained occurs in more than one column. For instance, one can imagine a dataset that contains columns such as City and Fulfillment-Center-City. "Warsaw" may be present in both of them, so ADA will choose one column and inform the user about that. If a user wants to use a name from different column (or from both) he has to type column's name in round brackets, just after the constraint.
EXAMPLE: Product in Paris ( Fulfillment-Center ) will return all products for which "Fulfillment-Center" is located in Warsaw.Aggregation is a process when basing on some condition the data is divided into subsets, and all actions are performed on those groups of data. The final result contains partial results for all subsets.
EXAMPLE: count Product by city on line will return the numbers of Product calculated for every cityTo use an aggregation a user has to provide special word by followed by an aggregator - a concept from the ontology.
EXAMPLE: average price in "Sport shoes" (Product) by city will return the average price of product "Sport shoes" calculated for every cityADA enables visualizing the result of a query in a different ways. Currently, there are seven possible output types. Using visualization bar menu it's possible to dynamically change the type of visualization and select shown data.
The aim of the visualization bar is a rapid change of visualization type and data selected to show. To choose other type of output just click on the icon. The result will be immediately without query changing. For histogram, stacked-bar, piechart, line and timeline you can change the data that is shown. When the output is chosen in the right of icon on visualization bar a cog is visible. Click once again to show options. First just listed three have X and Y axis to choose. Timeline has start and end date and description. Change columns in selection boxes and get results. To close the visualization bar options menu click the chosen icon again.
visualization bar EXAMPLE: "Fulfillment Date" and Price and Date and Trademarkshort and Fulfillment-Center and Margin in quantity Visualization bar menu gives possibility to quickly change visualization type.However there is a strict syntax of the query we still translate what the user send. Enabling fuzzy-matching in ADA configuration allows to "guess" about what the user wanted to ask when the spelling mistake occurs. ADA (using heuristic algorithms) tries to fit the best word from the dataset and execute the query with corrections. Every time some automatic changes are made the understood query is shown below the query bar. It is very useful for typical user when ADA gives expected result even if the query was grammatically incorrect or some word typed too quickly.
EXAMPLE: cuont trademrak will execute query "count Trademark".