dataTXT Find places, persons, brands, and events in documents and social media.

NEX Named Entity eXtraction

dataTXT-NEX is a named entity extraction & linking API that performs greatly even on short texts, something that many other similar services cannot do. With this API you will be able to automatically tag your texts, extract Wikipedia entities and enrich your data.

Try NEX demo

SIM text SIMilarity

dataTXT-SIM is a semantic sentence similarity API optimized on short sentences. With this API you will be able to compare two sentences and get a score of their semantic similarity. It works even if the two sentences don't have any word in common.

Try SIM demo

CL CLassifier

dataTXT-CL classifies short sentences into a set of user-defined classes. We will soon publish an API to define your own categories in a few simple steps, no complex training needed! Build your own classifier in 15 minutes.

Try CL demo

Best on the market for short sentences

Most existing text analysis technologies rely on natural language processing (NLP), trying to guess the structure of sentences to resolve ambiguities in the meaning of words. This approach does not work when texts are short, with mistakes or grammatically malformed, which is often the case for tweets and other social content.

Massively scalable, cloud based, easy to integrate

dataTXT is offered as a cloud-based Software-as-a-Service platform, easy to integrate into existing systems via a powerful REST API. The engine runs on a scalable infrastructure that can easily process millions of documents per-day. We also offer on-premise integration for enterprise customers with special data protection issues.

A door to the Web of Data & Linked Data

dataTXT does not only identify the presence of entities in the text, but it also links them with the huge amount of additional data that is available in Dandelion's knowledge graph. In turn, the graph is linked to many external data sources. dataTXT transforms your unstructured data into Linked Data!

Analyze your Excel, CSV & XML files

You don't need programming skills to analyze your MS Excel, OpenOffice, CSV, XML and JSON documents. dataTXT is integrated into OpenRefine, a powerful framework to clean-up and enrich tabular documents. Get the dataTXT OpenRefine extension (OpenRefine 2.6 only, see installation instructions on the Free Your Metadata website).

Multilingual & Grammar Independent technology

Most other text analysis technologies are natively designed to work on the English language. They support other languages via machine-translation, leading to poor results. dataTXT uses an approach based on graph topologies of its underlying Dandelion's knowledgebase, without relying on NLP technologies. This makes dataTXT natively language independent.

Coming soon

Semantic Sentence Similarity

Compare the semantic similarity of two sentences, written in the same or in different languages.

Expected release: Autumn 2013 done
User-defined Classification

Define your own categories and classify your documents in minutes. No need to provide long and expensive training data.

Expected release: Early 2014
Webpage article extraction

Clean-up web pages to extract the important content (text, pics, video). Throw away ads, menus, boilerplates.

Expected release: Early 2014
Sentiment Analysis & Quantification

Identity and classify the polarity of opinions expressed in the text at different levels of granularity.

Expected release: Mid 2014
Custom entity vocabularies

Extend the knowledge-base with your own, domain-specific entities and concepts.

Expected release: Early 2014
Additional Languages on-demand

Got content in a different language? we can train the system to support any latin language. Contact us for info.

Expected release: on-demand