The Entity Extraction API reference

This is a named entity extraction & linking API that performs very well even on short texts, on which many other similar services do not. It currently works on texts in English, French, German, Italian, Portuguese, Russian, Spanish. With this API you will be able to automatically tag your texts, extracting Wikipedia entities and enriching your data.

Endpoint

https://api.dandelion.eu/datatxt/nex/v1

We support both GET and POST methods to query the API.

Parameters

Remember to authenticate yourself specifying the token parameter (or the legacy $app_id and $app_key pair). See the API doc about authentication for any questions.

text|html|html_fragment required

These parameters define how you send text to the Entity Extraction API. Only one of them can be used in each request, following these guidelines:

use "text" when you have plain text that doesn't need any pre-processing;
use "html" when you have an HTML document and you want the Entity Extraction API to work on its main content. It will use an AI algorithm to extract the relevant part of the document to work on; in this case, the main content will also be returned by the API to allow you to properly use the annotation offsets;
use "html_fragment" when you have an HTML snippet and you want the Entity Extraction API to work on its content. It will remove all HTML tags before analyzing it.

Type

string

lang optional

The language of the text to be annotated: leave this parameter out to let the Entity Extraction API automatically detect the language for you. Currently English, French, German, Italian, Portuguese, Russian, Spanish are supported. We use the ISO 639-1 codes to identify languages.

Type	string
Default value	auto
Accepted values	de \| en \| es \| fr \| it \| pt \| ru \| auto

top_entities optional

The number of most important entities that must be included in the response.
If this value is greater than zero, a ranking algorithm will be applied to sort all entities by their importance with respect to the input text, and the most important ones will be included in the response in addition to the traditional annotations list. This can be useful if you are annotating long texts where the annotations list could contain dozens of entities but you would like to select only a few of them to represent the main topics of the text better.
For each entity, a score representing its importance will be included. Note that this score is not absolute, but can be used to compare the importance of entities of the same text only.

Type	integer
Default value	0
Accepted values	0 .. +inf

min_confidence optional

The threshold for the confidence value; entities with a confidence value below this threshold will be discarded. Confidence is a numeric estimation of the quality of the annotation, which ranges between 0 and 1. A higher threshold means you will get less but more precise annotations. A lower value means you will get more annotations but also more erroneous ones.

Type	float
Default value	0.6
Accepted values	0.0 .. 1.0

min_length optional

With this parameter you can remove those entities having a spot shorter than a minimum length.

Type	integer
Default value	2
Accepted values	2 .. +inf

parse_hashtag renamed optional

Use social.hashtag instead.

include optional

Returns more information on annotated entities:

"types" adds type information from DBpedia (see the complete DBpedia taxonomy) or dandelion. DBpedia types are extracted based on the lang parameter (e.g. if lang=en, types are extracted from DBpedia english). Please notice that different DBpedia instances may contain different types for the same resource;
"categories" adds category information from DBpedia/Wikipedia;
"abstract" adds the text of the Wikipedia abstract;
"image" adds a link to an image depicting the tagged entity, as well as a link to the image thumbnail, served by Wikipedia. Please check the licensing terms of each image on Wikipedia before using it in your app;
"lod" adds links to equivalent (sameAs) entities in Linked Open Data repositories or other websites. It currently only supports DBpedia and Wikipedia;
"alternate_labels" adds some other names used when referring to the entity.

Type	comma-separated list
Default value	<empty string>
Accepted values	types, categories, abstract, image, lod, alternate_labels
Example	include=types,lod

extra_types optional

Returns more information on annotated entities:

"phone" enables matching of phone numbers;
"vat" enables matching of VAT IDs (Italian only).

Note that these parameters require the country parameter to be set, and VAT IDs will work only for Italy.

Type	comma-separated list
Default value	<empty string>
Accepted values	phone, vat
Example	extra_types=phone,vat

country optional

This parameter specifies the country which we assume VAT and telephone numbers to be coming from. This is important to get correct results, as different countries may adopt different formats.

Type	string
Default value	<empty string>
Accepted values	AD, AE, AM, AO, AQ, AR, AU, BB, BR, BS, BY, CA, CH, CL, CN, CX, DE, FR, GB, HU, IT, JP, KR, MX, NZ, PG, PL, RE, SE, SG, US, YT, ZW

Advanced parameters

Looking for some advanced parameter? Show me more

Hint: Keep-alive
If you need to send many requests to the server api, it is suggested to use keep-alive to avoid the connection overhead. To know how to enable it, please refer to your http client documentation (eg: python-requests, ruby, php).

Response

The response is structured in JSON as follow:

{
  "timestamp": "Date and time of the response generation process",
  "time": "Time elapsed for generating the response (milliseconds)",
  "lang": "The language used to tag the input text",
  "langConfidence": "Accuracy of the language detection, from 0.0 to 1.0. Present only if auto-detection is on",
  "text": "The annotated text. Present only if the 'html' parameter has been used",
  "annotations": [
    {
      "id": "ID of the linked Wikipedia resource",
      "title": "Title of the linked Wikipedia resource",
      "uri": "URL of the entity on Wikipedia",
      "label": "Most common name used to represent the resource",
      "confidence": "Value of confidence for this annotation",
      "spot": "Annotated string, as it is in the input text",
      "start": "Character position in the input text where the annotation begins",
      "end": "Character position in the input text where the annotation ends",
      "types": ["List of types of the linked DBpedia resource","Only if 'include' parameter contains 'types'"],
      "categories": [
        "List of the category of the linked DBpedia resource",
        "Only if 'include' parameter contains 'categories'"
      ],
      "abstract": "Abstract of the linked Wikipedia resource. Only if 'include' parameter contains 'abstract'",
      "lod": {
        "wikipedia": "URL of the Wikipedia article that represents the resource",
        "dbpedia": "URI of the resource on DBpedia"
      },
      "alternateLabels": [
        "List of other names used when referring to the entity",
        "Only if 'include' parameter contains 'alternate_labels'"
      ],
      "image": {
        "full": "URL of a depiction of the resource on Wikipedia. Only if 'include' parameter contains 'image'",
        "thumbnail": "URL of the thumbnail of the depiction. Only if 'include' parameter contains 'image'",
      }
    }
  ],
  "topEntities": [ # Only if 'top_entities' parameter is greater than 0
    {
      "id": "ID of the linked Wikipedia resource",
      "uri": "URL of the entity on Wikipedia",
      "score": "The result of the ranking algorithm"
    }
  ]
}

For more information about status codes and error handling please refer to the dandelion generic API documentations. The cost of each request can be found in the response headers as described here.

Example

Request

https://api.dandelion.eu/datatxt/nex/v1/?lang=en &text=The%20doctor%20says%20an%20apple%20is%20better%20than%20an%20orange &include=types%2Cabstract%2Ccategories%2Clod&token=<YOUR_TOKEN>

Response

Connection: keep-alive
Content-Length: 2748
Content-Type: application/json;charset=UTF-8
Date: Wed, 21 Oct 2015 16:29:37 GMT
Server: Apache-Coyote/1.1
X-DL-units: 1
X-DL-units-left: 999
X-DL-units-reset: 2015-10-22 00:00:00 +0000

{
  "timestamp": "2015-10-21T16:29:37",
  "time": 2,
  "lang": "en",
  "annotations": [
    {
      "abstract": "A physician is a professional who practices medicine, which is concerned with promoting, maintaining or restoring human health through the study, diagnosis, and treatment of disease, injury, and other physical and mental impairments. They may focus their practice on certain disease categories, types of patients, or methods of treatment \u2013 known as specialist medical practitioners \u2013 or assume responsibility for the provision of continuing and comprehensive medical care to individuals, families, and communities \u2013 known as general practitioners. Medical practice properly requires both a detailed knowledge of the academic disciplines (such as anatomy and physiology) underlying diseases and their treatment \u2013 the science of medicine \u2013 and also a decent competence in its applied practice \u2013 the art or craft of medicine.",
      "id": 23315,
      "title": "Physician",
      "start": 4,
      "categories": [
        "Physicians",
        "Healthcare occupations",
        "Occupations"
      ],
      "lod": {
        "wikipedia": "http://en.wikipedia.org/wiki/Physician",
        "dbpedia": "http://dbpedia.org/resource/Physician"
      },
      "label": "Physician",
      "types": [],
      "confidence": 0.438,
      "uri": "http://en.wikipedia.org/wiki/Physician",
      "end": 10,
      "spot": "doctor"
    },
    {
      "abstract": "The apple is the pomaceous fruit of the apple tree, species Malus domestica in the rose family (Rosaceae). It is one of the most widely cultivated tree fruits, and the most widely known of the many members of genus Malus that are used by humans. Apples grow on small, deciduous trees. The tree originated in Central Asia, where its wild ancestor, Malus sieversii, is still found today. Apples have been grown for thousands of years in Asia and Europe, and were brought to North America by European colonists. Apples have been present in the mythology and religions of many cultures, including Norse, Greek and Christian traditions. In 2010, the fruit's genome was decoded, leading to new understandings of disease control and selective breeding in apple production.",
      "id": 18978754,
      "title": "Apple",
      "start": 19,
      "categories": [
        "Apples",
        "Malus",
        "Plants described in 1803",
        "Sequenced genomes"
      ],
      "lod": {
        "wikipedia": "http://en.wikipedia.org/wiki/Apple",
        "dbpedia": "http://dbpedia.org/resource/Apple"
      },
      "label": "Apple",
      "types": [
        "http://dbpedia.org/ontology/Eukaryote",
        "http://dbpedia.org/ontology/Plant",
        "http://dbpedia.org/ontology/Species"
      ],
      "confidence": 0.7869,
      "uri": "http://en.wikipedia.org/wiki/Apple",
      "end": 24,
      "spot": "apple"
    },
    {
      "abstract": "The orange (specifically, the sweet orange) is the fruit of the citrus species Citrus × sinensis in the family Rutaceae. The fruit of the Citrus sinensis is called sweet orange to distinguish it from that of the Citrus aurantium, the bitter orange. The orange is a hybrid, possibly between pomelo (Citrus maxima) and mandarin (Citrus reticulata), cultivated since ancient times.",
      "id": 4984440,
      "title": "Orange (fruit)",
      "start": 43,
      "categories": [
        "Oranges",
        "Citrus hybrids",
        "Tropical agriculture",
        "Symbols of Florida",
        "Symbols of California",
        "United States state plants",
        "World Digital Library related"
      ],
      "lod": {
        "wikipedia": "http://en.wikipedia.org/wiki/Orange_(fruit)",
        "dbpedia": "http://dbpedia.org/resource/Orange_(fruit)"
      },
      "label": "Orange",
      "types": [
        "http://dbpedia.org/ontology/Eukaryote",
        "http://dbpedia.org/ontology/FloweringPlant",
        "http://dbpedia.org/ontology/Plant",
        "http://dbpedia.org/ontology/Species"
      ],
      "confidence": 0.7515,
      "uri": "http://en.wikipedia.org/wiki/Orange_(fruit)",
      "end": 49,
      "spot": "orange"
    }
  ]
}

If you're new to the Entity Extraction API you may want to:

Read the getting started Check out the demo

The Entity Extraction API reference

Endpoint

Parameters

Advanced parameters

Response

Example

Request

Response

Contact Us

About Us

Type	boolean
Default value	false
Accepted values	true \| false