Text Similarity API reference

This API is a semantic sentence similarity API optimized on short sentences. With this API you will be able to compare two sentences and get a score of their semantic similarity. It works even if the two sentences don't have any word in common.

Endpoint

https://api.dandelion.eu/datatxt/sim/v1

We support both GET and POST methods to query the API.

Parameters

Remember to authenticate yourself specifying the token parameter (or the legacy $app_id and $app_key pair). See the API doc about authentication for any questions.

text1|html1|html_fragment1 required

These parameters define how you send to the Text Similarity API the first text you want to compare. Only one of them can be used in each request, following these guidelines:

use "text" when you have plain text that doesn't need any pre-processing;
use "html" when you have an HTML document and you want the Text Similarity API to work on its main content. It will use an AI algorithm to extract the relevant part of the document to work on; in this case, the main content will also be returned by the API to allow you to properly use the annotation offsets;
use "html_fragment" when you have an HTML snippet and you want the Text Similarity API to work on its content. It will remove all HTML tags before analyzing it.

Type

string

text2|html2|html_fragment2 required

These parameters define how you send to the Text Similarity API the second text you want to compare, in the same way as the text1|html1|html_fragment1 parameters.

Type

string

lang optional

The language of the texts to be compared; currently English, French, German, Italian, Portuguese, Russian and Spanish are supported. Leave this parameter out to let the Text Similarity API automatically detect the language for you.

Type	string
Default value	auto
Accepted values	de \| en \| es \| fr \| it \| pt \| ru \| auto

bow optional

The Text Similarity API normally uses a semantic algorithm for computing similarity of texts. It is possible, however, to use a more classical syntactic algorithm where the semantic one fails. This can be done with this parameter.

"never" uses always the semantic algorithm;
"both_empty" uses the syntactic algorithm if both the two texts have no semantic information;
"one_empty" uses the syntactic algorithm if at least one of the two inputs have no semantic information;
"always" uses always the syntactic algorithm.

Type	string
Default value	never
Accepted values	always \| one_empty \| both_empty \| never

Did you know?
You can use the Entity Extraction API's parameters as well, prefixing them with nex. (e.g: nex.min_confidence)

Hint: Keep-alive
If you need to send many requests to the server api, it is suggested to use keep-alive to avoid the connection overhead. To know how to enable it, please refer to your http client documentation (eg: python-requests, ruby, php).

Response

The response is structured in JSON as follow:

{
  "timestamp": "Date and time of the response generation process",
  "time": "Time elapsed for generating the response (milliseconds)",
  "lang": "The language used to compare the given texts",
  "langConfidence": "Accuracy of the language detection, from 0.0 to 1.0. Present only if auto-detection is on",
  "similarity": "Similarity of the two given texts, from 0.0 to 1.0. Higher is better"
}

For more information about status codes and error handling please refer to the dandelion generic API documentations. The cost of each request can be found in the response headers as described here.

Example

Request

https://api.dandelion.eu/datatxt/sim/v1/?text1=Cameron%20wins%20the%20Oscar &text2=All%20nominees%20for%20the%20Academy%20Awards&token=<YOUR_TOKEN>

Response

Connection: keep-alive
Content-Length: 2748
Content-Type: application/json;charset=UTF-8
Date: Wed, 21 Oct 2015 16:29:37 GMT
Server: Apache-Coyote/1.1
X-DL-units: 3
X-DL-units-left: 997
X-DL-units-reset: 2015-10-22 00:00:00 +0000

{
  "timestamp": "2015-10-21T16:29:37",
  "time": 3,
  "lang": "en",
  "langConfidence": 1,
  "text1": "The annotated text. Present only if the 'html1' parameter has been used",
  "text2": "The annotated text. Present only if the 'html2' parameter has been used",
  "similarity": 0.6355
}

Did you know?
The Text Similarity API works better on short texts, like 5-20 words.

Text Similarity API reference

Endpoint

Parameters

Response

Example

Request

Response

Contact Us

About Us