Support

The Entity Extraction API reference

This is a named entity extraction & linking API that performs very well even on short texts, on which many other similar services do not. It currently works on texts in English, French, German, Italian, Portuguese, Russian, Spanish. With this API you will be able to automatically tag your texts, extracting Wikipedia entities and enriching your data.

Endpoint

https://api.dandelion.eu/datatxt/nex/v1

We support both GET and POST methods to query the API.

Parameters

Remember to authenticate yourself specifying the token parameter (or the legacy $app_id and $app_key pair). See the API doc about authentication for any questions.

text|html|html_fragment required
These parameters define how you send text to the Entity Extraction API. Only one of them can be used in each request, following these guidelines:
  • use "text" when you have plain text that doesn't need any pre-processing;
  • use "html" when you have an HTML document and you want the Entity Extraction API to work on its main content. It will use an AI algorithm to extract the relevant part of the document to work on; in this case, the main content will also be returned by the API to allow you to properly use the annotation offsets;
  • use "html_fragment" when you have an HTML snippet and you want the Entity Extraction API to work on its content. It will remove all HTML tags before analyzing it.
Type string
lang optional
The language of the text to be annotated: leave this parameter out to let the Entity Extraction API automatically detect the language for you. Currently English, French, German, Italian, Portuguese, Russian, Spanish are supported. We use the ISO 639-1 codes to identify languages.
Type string
Default value auto
Accepted values de | en | es | fr | it | pt | ru | auto
top_entities optional
The number of most important entities that must be included in the response.
If this value is greater than zero, a ranking algorithm will be applied to sort all entities by their importance with respect to the input text, and the most important ones will be included in the response in addition to the traditional annotations list. This can be useful if you are annotating long texts where the annotations list could contain dozens of entities but you would like to select only a few of them to represent the main topics of the text better.
For each entity, a score representing its importance will be included. Note that this score is not absolute, but can be used to compare the importance of entities of the same text only.
Type integer
Default value 0
Accepted values 0 .. +inf
min_confidence optional
The threshold for the confidence value; entities with a confidence value below this threshold will be discarded. Confidence is a numeric estimation of the quality of the annotation, which ranges between 0 and 1. A higher threshold means you will get less but more precise annotations. A lower value means you will get more annotations but also more erroneous ones.
Type float
Default value 0.6
Accepted values 0.0 .. 1.0
min_length optional
With this parameter you can remove those entities having a spot shorter than a minimum length.
Type integer
Default value 2
Accepted values 2 .. +inf
parse_hashtag renamed optional
Use social.hashtag instead.
social.hashtag optional
With this parameter you enable special hashtag parsing to correctly analyze tweets and facebook posts.
Type boolean
Default value false
Accepted values true | false
social.mention optional
With this parameter you enable special mention parsing to correctly analyze tweets and facebook posts.
Type boolean
Default value false
Accepted values true | false
include optional
Returns more information on annotated entities:
  • "types" adds type information from DBpedia (see the complete DBpedia taxonomy) or dandelion. DBpedia types are extracted based on the lang parameter (e.g. if lang=en, types are extracted from DBpedia english). Please notice that different DBpedia instances may contain different types for the same resource;
  • "categories" adds category information from DBpedia/Wikipedia;
  • "abstract" adds the text of the Wikipedia abstract;
  • "image" adds a link to an image depicting the tagged entity, as well as a link to the image thumbnail, served by Wikipedia. Please check the licensing terms of each image on Wikipedia before using it in your app;
  • "lod" adds links to equivalent (sameAs) entities in Linked Open Data repositories or other websites. It currently only supports DBpedia and Wikipedia;
  • "alternate_labels" adds some other names used when referring to the entity.
Type comma-separated list
Default value <empty string>
Accepted values types, categories, abstract, image, lod, alternate_labels
Example include=types,lod
extra_types optional
Returns more information on annotated entities:
  • "phone" enables matching of phone numbers;
  • "vat" enables matching of VAT IDs (Italian only).
Note that these parameters require the country parameter to be set, and VAT IDs will work only for Italy.
Type comma-separated list
Default value <empty string>
Accepted values phone, vat
Example extra_types=phone,vat
country optional
This parameter specifies the country which we assume VAT and telephone numbers to be coming from. This is important to get correct results, as different countries may adopt different formats.
Type string
Default value <empty string>
Accepted values AD, AE, AM, AO, AQ, AR, AU, BB, BR, BS, BY, CA, CH, CL, CN, CX, DE, FR, GB, HU, IT, JP, KR, MX, NZ, PG, PL, RE, SE, SG, US, YT, ZW

Advanced parameters

Looking for some advanced parameter? Show me more

epsilon optional
advanced
This parameter defines whether the Entity Extraction API should rely more on the context or favor more common topics to discover entities. Using an higher value favors more common topics, this may lead to better results when processing tweets or other fragmented inputs where the context is not always reliable.
Type float
Default value 0.3
Accepted values 0.0 .. 0.5
Hint: Keep-alive
If you need to send many requests to the server api, it is suggested to use keep-alive to avoid the connection overhead. To know how to enable it, please refer to your http client documentation (eg: python-requests, ruby, php).

Response

The response is structured in JSON as follow:

{
  "timestamp": "Date and time of the response generation process",
  "time": "Time elapsed for generating the response (milliseconds)",
  "lang": "The language used to tag the input text",
  "langConfidence": "Accuracy of the language detection, from 0.0 to 1.0. Present only if auto-detection is on",
  "text": "The annotated text. Present only if the 'html' parameter has been used",
  "annotations": [
    {
      "id": "ID of the linked Wikipedia resource",
      "title": "Title of the linked Wikipedia resource",
      "uri": "URL of the entity on Wikipedia",
      "label": "Most common name used to represent the resource",
      "confidence": "Value of confidence for this annotation",
      "spot": "Annotated string, as it is in the input text",
      "start": "Character position in the input text where the annotation begins",
      "end": "Character position in the input text where the annotation ends",
      "types": ["List of types of the linked DBpedia resource","Only if 'include' parameter contains 'types'"],
      "categories": [
        "List of the category of the linked DBpedia resource",
        "Only if 'include' parameter contains 'categories'"
      ],
      "abstract": "Abstract of the linked Wikipedia resource. Only if 'include' parameter contains ­'abstract'",
      "lod": {
        "wikipedia": "URL of the Wikipedia article that represents the resource",
        "dbpedia": "URI of the resource on DBpedia"
      },
      "alternateLabels": [
        "List of other names used when referring to the entity",
        "Only if 'include' parameter contains 'alternate_labels'"
      ],
      "image": {
        "full": "URL of a depiction of the resource on Wikipedia. Only if 'include' parameter contains 'image'",
        "thumbnail": "URL of the thumbnail of the depiction. Only if 'include' parameter contains 'image'",
      }
    }
  ],
  "topEntities": [ # Only if 'top_entities' parameter is greater than 0
    {
      "id": "ID of the linked Wikipedia resource",
      "uri": "URL of the entity on Wikipedia",
      "score": "The result of the ranking algorithm"
    }
  ]
}

For more information about status codes and error handling please refer to the dandelion generic API documentations. The cost of each request can be found in the response headers as described here.

Example

Request

https://api.dandelion.eu/datatxt/nex/v1/?lang=en &text=The%20doctor%20says%20an%20apple%20is%20better%20than%20an%20orange &include=types%2Cabstract%2Ccategories%2Clod&token=<YOUR_TOKEN>

Response

Connection: keep-alive
Content-Length: 2748
Content-Type: application/json;charset=UTF-8
Date: Wed, 21 Oct 2015 16:29:37 GMT
Server: Apache-Coyote/1.1
X-DL-units: 1
X-DL-units-left: 999
X-DL-units-reset: 2015-10-22 00:00:00 +0000
{
  "timestamp": "2015-10-21T16:29:37",
  "time": 2,
  "lang": "en",
  "annotations": [
    {
      "abstract": "A physician is a professional who practices medicine, which is concerned with promoting, maintaining or restoring human health through the study, diagnosis, and treatment of disease, injury, and other physical and mental impairments. They may focus their practice on certain disease categories, types of patients, or methods of treatment \u2013 known as specialist medical practitioners \u2013 or assume responsibility for the provision of continuing and comprehensive medical care to individuals, families, and communities \u2013 known as general practitioners. Medical practice properly requires both a detailed knowledge of the academic disciplines (such as anatomy and physiology) underlying diseases and their treatment \u2013 the science of medicine \u2013 and also a decent competence in its applied practice \u2013 the art or craft of medicine.",
      "id": 23315,
      "title": "Physician",
      "start": 4,
      "categories": [
        "Physicians",
        "Healthcare occupations",
        "Occupations"
      ],
      "lod": {
        "wikipedia": "http://en.wikipedia.org/wiki/Physician",
        "dbpedia": "http://dbpedia.org/resource/Physician"
      },
      "label": "Physician",
      "types": [],
      "confidence": 0.438,
      "uri": "http://en.wikipedia.org/wiki/Physician",
      "end": 10,
      "spot": "doctor"
    },
    {
      "abstract": "The apple is the pomaceous fruit of the apple tree, species Malus domestica in the rose family (Rosaceae). It is one of the most widely cultivated tree fruits, and the most widely known of the many members of genus Malus that are used by humans. Apples grow on small, deciduous trees. The tree originated in Central Asia, where its wild ancestor, Malus sieversii, is still found today. Apples have been grown for thousands of years in Asia and Europe, and were brought to North America by European colonists. Apples have been present in the mythology and religions of many cultures, including Norse, Greek and Christian traditions. In 2010, the fruit's genome was decoded, leading to new understandings of disease control and selective breeding in apple production.",
      "id": 18978754,
      "title": "Apple",
      "start": 19,
      "categories": [
        "Apples",
        "Malus",
        "Plants described in 1803",
        "Sequenced genomes"
      ],
      "lod": {
        "wikipedia": "http://en.wikipedia.org/wiki/Apple",
        "dbpedia": "http://dbpedia.org/resource/Apple"
      },
      "label": "Apple",
      "types": [
        "http://dbpedia.org/ontology/Eukaryote",
        "http://dbpedia.org/ontology/Plant",
        "http://dbpedia.org/ontology/Species"
      ],
      "confidence": 0.7869,
      "uri": "http://en.wikipedia.org/wiki/Apple",
      "end": 24,
      "spot": "apple"
    },
    {
      "abstract": "The orange (specifically, the sweet orange) is the fruit of the citrus species Citrus × sinensis in the family Rutaceae. The fruit of the Citrus sinensis is called sweet orange to distinguish it from that of the Citrus aurantium, the bitter orange. The orange is a hybrid, possibly between pomelo (Citrus maxima) and mandarin (Citrus reticulata), cultivated since ancient times.",
      "id": 4984440,
      "title": "Orange (fruit)",
      "start": 43,
      "categories": [
        "Oranges",
        "Citrus hybrids",
        "Tropical agriculture",
        "Symbols of Florida",
        "Symbols of California",
        "United States state plants",
        "World Digital Library related"
      ],
      "lod": {
        "wikipedia": "http://en.wikipedia.org/wiki/Orange_(fruit)",
        "dbpedia": "http://dbpedia.org/resource/Orange_(fruit)"
      },
      "label": "Orange",
      "types": [
        "http://dbpedia.org/ontology/Eukaryote",
        "http://dbpedia.org/ontology/FloweringPlant",
        "http://dbpedia.org/ontology/Plant",
        "http://dbpedia.org/ontology/Species"
      ],
      "confidence": 0.7515,
      "uri": "http://en.wikipedia.org/wiki/Orange_(fruit)",
      "end": 49,
      "spot": "orange"
    }
  ]
}

If you're new to the Entity Extraction API you may want to:

SpazioDati Via A. Olivetti 13, 38122, Trento (TN) -


Dandelion API built with by
Company subject to management and coordination of Cerved Group S.p.A.

site privacy | api privacy | tos | cookies | consent preferences

Contact Us

@dandelionapi

Need more info or a custom project? Write us: hello@dandelion.eu

About Us

We're a startup based in Italy, specialized in Semantics & Big Data.
Find out more about us at spaziodati.eu