we are hiring!
Support

dataTXT-LI API reference

dataTXT-LI is a simple language identification API; it is a tool that may be useful when dealing with texts, so we decided to open it to all our users. It currently supports more than 50 languages.

Endpoint

https://api.dandelion.eu/datatxt/li/v1

We support both GET and POST methods to query the API.

Parameters

Remember to authenticate yourself specifying the $app_id and $app_key parameters. See the API doc about authentication for any questions.

text|url|html|html_fragment required
These parameters define how you send to the LI API the text for which you want the language to be recognized. Only one of them can be used in each request, following these guidelines:
  • use "text" when you have plain text that doesn't need any pre-processing;
  • use "url" when you have an URL and you want dataTXT to work on its main content; the dataTXT API will fetch the URL for you, and use an AI algorithm to extract the relevant part of the document to work on; in this case, the main content will also be returned by the API to allow you to properly use the annotation offsets;
  • use "html" when you have an HTML document and you want dataTXT to work on its main content, similarly to what the "url" parameter does.
  • use "html_fragment" when you have an HTML snippet and you want dataTXT to work on its content. DataTXT will remove all HTML tags before analyzing it.
Type string
clean optional
Set this parameter to true if you want the text to be cleaned from urls, email addresses, hashtags, and more, before being processed.
Type boolean
Default value false
Accepted values true | false

Response

The response is structured in JSON as follow:

{
  "timestamp": "Date and time of the response generation process",
  "time": "Time elapsed for generating the response (milliseconds)",
  "detectedLangs": [
    {
      "lang": "ISO 639-1 code of the detected language",
      "confidence": "Accuracy of the language detection",
    }
  ]
}

For more information about status codes and error handling please refer to the dandelion generic API documentations. The cost of each request can be found in the response headers as described here.

Example

Request

https://api.dandelion.eu/datatxt/li/v1/?text=I%20am%20a%20mighty%20pirate%20von%20Deutschland &$app_id=YOUR_APP_ID&$app_key=YOUR_APP_KEY

Response

Connection: keep-alive
Content-Length: 2748
Content-Type: application/json;charset=UTF-8
Date: Wed, 21 Oct 2015 16:29:37 GMT
Server: Apache-Coyote/1.1
X-DL-units: 0.1
X-DL-units-left: 999.9
X-DL-units-reset: 2015-10-22 00:00:00 +0000
{
  "timestamp": "2015-10-21T16:29:37",
  "time": 1,
  "text": "The annotated text. Present only if the 'url' or 'html' parameters have been used",
  "url": "The actual URL from which the text has been extracted. Present only if the 'url' parameter has been used",
  "detectedLangs": [
    {
      "lang": "de",
      "confidence": 0.5714284059202707
    },
    {
      "lang": "en",
      "confidence": 0.42857127625587477
    }
  ]
}