API Documentation
Our REST API is a package of artificial intelligence and blockchain-powered solutions for analyzing and extracting various kinds of information from unstructured text data, videos and images.
This documentation allows you to start working with the API and provides you information about the API methods and options.
Endpoint
The main endpoint for all API calls:
https://www.summarizebot.com/api/
API Key
To use our API you will need an API key. Please, register to get your personal API key for 14 days trial period.
You should add your API key as a parameter for every request sent to our API:
[main endpoint]/[method]?apiKey=[api key]
Get Started
Once you have your personal API key, you can use the API in the following way:
Select the API method you are interested in from this documentation
Send HTTP GET or POST requests to the main endpoint, e.g. for a document summarization call the full URL would be:
https://www.summarizebot.com/api/summarize?[options]
Also you can test-drive our API methods by importing the Postman Collection below. This is a quick and easy way to become more familiar with the SummarizeBot API and how it works
Usage Examples
URLs Processing
You can use the following Python code to process weblinks:
import requests
# API URL
# You can change 'summarize' to different endpoints: sentiment, keywords, etc.
api_url = "https://www.summarizebot.com/api/summarize?apiKey=YOUR_API_KEY&size=20&keywords=10&fragments=15&url=URL_FOR_PROCESSING"
r = requests.get(api_url)
json_res = r.json()
print json_res
cURL request:
curl -X GET "https://www.summarizebot.com/api/summarize?apiKey=YOUR_API_KEY&size=20&keywords=10&fragments=15&url=URL_FOR_PROCESSING"
Files Processing
To process files you can use our POST API endpoints. POST body should be specified as 'application/octet-stream' and include file content in binary form. In Python you can use the following code:
import requests
# Read binary data from the file
with open('test.txt', mode='rb') as file:
post_body = file.read()
# API URL
# You can change 'summarize' to different endpoints: sentiment, keywords, etc.
api_url = "https://www.summarizebot.com/api/summarize?apiKey=your_API_key&size=20&keywords=10&fragments=15&filename=test.txt"
# HTTP header
header = {'Content-Type': "application/octet-stream"}
r = requests.post(api_url, headers = header, data = post_body)
json_res = r.json()
print json_res
cURL request:
curl -H "Content-Type:application/octet-stream" --data-binary @test.txt https://www.summarizebot.com/api/summarize?apiKey=your_API_key&size=20&keywords=10&fragments=15&filename=test.txt
Plain Text Processing
To process text strings you need to represent them as binary data (bytes) and send bytes as POST body in POST requests. In Python you can use the following code:
import requests
# Text for processing in UTF-8 encoding
text_for_processing = u"Planet has only until 2030 to stem catastrophic climate change, experts warn."
# Create bytes representation of the text
post_body = bytes(text_for_processing.encode('utf-8'))
# API URL
# You can change 'summarize' to different endpoints: sentiment, keywords, etc.
api_url = "https://www.summarizebot.com/api/summarize?apiKey=your_API_key&size=20&keywords=10&fragments=15&filename=1.txt"
# HTTP header
header = {'Content-Type': "application/octet-stream"}
r = requests.post(api_url, headers = header, data = post_body)
json_res = r.json()
print json_res
cURL request:
curl -H "Content-Type:application/octet-stream" --data "Planet has only until 2030 to stem catastrophic climate change, experts warn." https://www.summarizebot.com/api/summarize?apiKey=your_API_key&size=20&keywords=10&fragments=15&filename=1.txt
Error Codes
The API methods may return the following errors:
400
- bad request401
- API key is invalid or expired402
- maximum file size limit is exceeded403
- http header isn't specified as 'application/octet-stream'404
- http header isn't specified as 'application/json'429
- too many requests (rate limit exceeds)500
- internal server error
Language Support
Document summarization and keywords extraction features are available for almost every language including English, Chinese, Russian, Japanese, Arabic, German, Spanish, French, Portuguese, etc. Please see full list here.
Sentiment analysis method supports English, French, German, Italian, Portuguese, Spanish and Russian languages.
Named entity recognition method supports major European and Asian languages including English, French, German, Italian, Portuguese, Spanish, Russian, Japanese, etc.
Fake news detection method supports English language only.
For audio recognition the API supports the following languages: English, Russian, Chinese, French, German, Italian, Spanish, Japanese, Swedish, Finnish, Arabic.
For text extraction from images our API supports the following languages: English, Latvian, French, German, Russian, Italian, Dutch, Spanish, Portuguese, Swedish, Finnish.
File Formats
The text analysis API methods support most of the text, image and audio formats: .html, .pdf, .doc, .docx, .csv, .eml, .epub, .gif, .jpg, .jpeg, .mp3, .msg, .odt, .ogg, .png, .pptx, .ps, .rtf, .tiff, .tif, .txt, .wav, .xlsx, .xls, .psv, .tsv, .tff, .aif, .aiff, .avr, .cdr, .wv, .au, .flac, .snd, .vox.
The article extraction and language detection methods can only process text files and scanned documents (e.g. PDF files with images).
The video identification and comments extraction features deal only with hypertext files (.html, .xml, etc.).
Summarization
The summarization method automatically extracts the most important information, keywords and keyphrases from weblinks, documents, audio files and images. With the help of summarization API you can create general or topic-oriented summaries for different domains. Just add 'domain' option with specific parameter in your request and the output summary will consist of the sentences, which are mostly relevant to a given domain.
Supported Domains
Summarization API supports the following domains: accounting
, agriculture
, art
, automotive
, beauty
, business
, construction
, culture
, demographics
, economics
, education
, electronics
, energy
, environment
, european_union
, finance
, fisheries
, foods
, forestry
, gardening
, geography
, healthcare
, human_resources
, industries
, insurance
, intellectual_property
, international_organizations
, international_relations
, investments
, it
, legal
, literature
, management
, marketing
, parliament
, pets
, politics
, production
, religion
, science
, social_issues
, sports
, taxes
, technology
, trade
, transportation_and_cargo
, travel
, weather
.
Caution
The language of text documents will be detected automatically. For audio files and images it should be specified for each request. If the value for language
is undefined, then the default language for audio and image processing will be set to English
.
Create a summary from weblinks
GET
/summarize
Summarize file from a given url.
Example URI
- apiKey
string
(required)API Key
- size
integer
(optional, default = 16)Summary length as percentage of original document
- url
string
(required)Article or web page url
- keywords
integer
(optional, default = 10)Maximum count of keywords to return
- fragments
integer
(optional, default = 15)Maximum count of key fragments to return
- domain
string
(optional)Domain identifier for topic-oriented summarization
- language
string
(optional for text files, required for audio files and images)A language of text files will be detected automatically. For audio files it should be specified from the list of supported languages, e.g. language=German.
- isocr
boolean
(optional, default = false)use optical character recognition for PDF documents processing (documents with images). If isocr is set to true, the document language should be specified from the list of supported languages, e.g. language=English (see the Language Support section for more details).
200
Headers
Content-Type: application/json
Schema
[
{
"summary" : [
{
"id" : 0,
"weight" : 2.43,
"sentence" : "Artificial intelligence (AI, also machine intelligence,
MI) is intelligence displayed by machines, in contrast with the
natural intelligence (NI) displayed by humans and other animals."
},
{
"id" : 1,
"weight" : 2.04,
"sentence" : "AI research is defined as the study of \\"intelligent
agents\\": any device that perceives its environment and takes
actions that maximize its chance of success at some goal."
}
]
},
{
"keywords" : [
{
"keyword" : "artificial intelligence",
"weight" : 0.87,
"ids" : [
1,
6
]
},
{
"keyword" : "machines",
"weight" : 0.71,
"ids" : [
0,
4
]
}
]
},
{
"fragments" : [
{
"fragment" : "optical character recognition",
"ids" : [
5
],
"weight" : 0.15
}
]
}
]
Create a summary from binary data
POST
/summarize
Summarize file from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.
Example URI
- apiKey
string
(required)API Key
- size
integer
(optional, default = 16)Summary length as percentage of original document
- filename
string
(required)Name of the file, e.g. filename=1.pdf
- keywords
integer
(optional, default = 10)Maximum count of keywords to return
- fragments
integer
(optional, default = 15)Maximum count of key fragments to return
- domain
string
(optional)Domain identifier for topic-oriented summarization
- language
string
(optional for text files, required for audio files and images)A language of text files will be detected automatically. For audio files it should be specified from the list of supported languages, e.g. language=German.
- isocr
boolean
(optional, default = false)use optical character recognition for PDF documents processing (documents with images). If isocr is set to true, the document language should be specified from the list of supported languages, e.g. language=English (see the Language Support section for more details).
200
Headers
Content-Type: application/json
Schema
[
{
"summary" : [
{
"id" : 0,
"weight" : 2.43,
"sentence" : "Artificial intelligence (AI, also machine intelligence,
MI) is intelligence displayed by machines, in contrast with the
natural intelligence (NI) displayed by humans and other animals."
},
{
"id" : 1,
"weight" : 2.04,
"sentence" : "AI research is defined as the study of \\"intelligent
agents\\": any device that perceives its environment and takes
actions that maximize its chance of success at some goal."
}
]
},
{
"keywords" : [
{
"keyword" : "artificial intelligence",
"weight" : 0.87,
"ids" : [
1,
6
]
},
{
"keyword" : "machines",
"weight" : 0.71,
"ids" : [
0,
4
]
}
]
},
{
"fragments" : [
{
"fragment" : "optical character recognition",
"ids" : [
5
],
"weight" : 0.15
}
]
}
]
Sentiment Analysis
The sentiment analysis method analyzes text to return the sentiment as positive, negative or neutral. Additionally it provides an overall score of the aggregate sentiment for the entire text and a list of aspects that are mentioned in a document (negative or positive words and phrases).
Sentiment analysis API identifies user sentiment not only on document-level, but also detects sentence-level and object-level sentiment. With the help of sentiment analysis API you can correctly detect concrete sentiment objects and opinion phrases and understand the meaning of user reviews.
Caution
The sentiment analysis method is available for English, French, German, Italian, Portuguese, Spanish and Russian languages.
Analyze sentiment from weblinks
GET
/sentiment
Analyze text for positive or negative sentiment from a given url.
Example URI
- apiKey
string
(required)API Key
- url
string
(required)Article or web page url
- language
string
(optional)Document language in the ISO 639-1 format. If the value for
language
is undefined the document language will be detected automatically
200
Headers
Content-Type: application/json
Schema
[
{
"document sentiment": {
"polarity": "negative",
"weight": -1.99
}
},
{
"sentiment aspects": [
{
"features": [
{
"polarity": "negative",
"weight": -0.5,
"sentiment object": {
"start offset": 0,
"object": "The burger",
"end offset": 10
},
"end offset": 28,
"start offset": 15,
"phrase": "uncooked , raw"
},
{
"polarity": "negative",
"phrase": "left",
"end offset": 38,
"weight": -0.56,
"start offset": 34
},
{
"polarity": "negative",
"weight": -0.64,
"sentiment object": {
"start offset": 76,
"object": "person",
"end offset": 82
},
"end offset": 75,
"start offset": 71,
"phrase": "poor"
},
{
"polarity": "negative",
"phrase": "be severely poisoned",
"end offset": 114,
"weight": -0.5,
"start offset": 94
}
],
"sentence": "The burger was uncooked, raw, but left out in the sun waiting for some poor person to eat and be severely poisoned."
}
]
}
]
Analyze sentiment from binary data
POST
/sentiment
Analyze text for positive or negative sentiment from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.
Example URI
- apiKey
string
(required)API Key
- filename
string
(required)Name of the file, e.g. filename=1.html
- language
string
(optional)Document language in the ISO 639-1 format. If the value for
language
is undefined the document language will be detected automatically
200
Headers
Content-Type: application/json
Schema
[
{
"document sentiment": {
"polarity": "negative",
"weight": -1.99
}
},
{
"sentiment aspects": [
{
"features": [
{
"polarity": "negative",
"weight": -0.5,
"sentiment object": {
"start offset": 0,
"object": "The burger",
"end offset": 10
},
"end offset": 28,
"start offset": 15,
"phrase": "uncooked , raw"
},
{
"polarity": "negative",
"phrase": "left",
"end offset": 38,
"weight": -0.56,
"start offset": 34
},
{
"polarity": "negative",
"weight": -0.64,
"sentiment object": {
"start offset": 76,
"object": "person",
"end offset": 82
},
"end offset": 75,
"start offset": 71,
"phrase": "poor"
},
{
"polarity": "negative",
"phrase": "be severely poisoned",
"end offset": 114,
"weight": -0.5,
"start offset": 94
}
],
"sentence": "The burger was uncooked, raw, but left out in the sun waiting for some poor person to eat and be severely poisoned."
}
]
}
]
News Aggregation
The news aggregation method returns news headlines and searches for articles from over 50,000 sources. Retrieval results include details like main image of the news article, article title and direct url, publication date, and relevancy score to search request.
News API endpoints support 100+ languages, that are specified in the ISO 639-1 format.
Thousands of news sources has been indexed and analyzed by our custom artificial intelligence modules to give the perfect search accuracy in natural language mode.
Return latest news for a specific language
GET
/news
Return live and top news for different languages.
Example URI
- apiKey
string
(required)API Key
- language
string
(optional, default=en)Language code in the ISO 639-1 format
- count
integer
(optional, default=10, maximum value=50)Maximum count of news to return
200
Headers
Content-Type: application/json
Schema
{
"results": [
{
"url": "https://www.theaustralian.com.au/sport/cricket/jaques-was-last-man-standing-but-a-nsw-pedigree-hard-to-go-past/news-story/86a3ed596aa5766bfb562f912dfa227e",
"publication_date": "2018-05-29 14:05:46",
"image_url": "https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcR-v0h1BL_w2ILuDVC07L926nHGIIxb8bGWYdwZAh8K6UJsu-DqnTJ7b9Z1cFLZRQqWHjGPXrNInQ",
"language": "en",
"title": "Jaques was last man standing but a NSW pedigree hard to go past"
},
{
"url": "https://gulfnews.com/sport/uae/football/own-goal-sinks-defending-champions-al-taher-1.2228752",
"publication_date": "2018-05-29 14:00:50",
"image_url": "https://static.gulfnews.com/polopoly_fs/1.2228830!/image/4040701382.jpg_gen/derivatives/box_460346/4040701382.jpg",
"language": "en",
"title": "Own goal sinks defending champions Al Taher"
},
{
"url": "https://www.forbes.com/sites/robinandrews/2018/05/29/this-is-why-han-solo-may-owe-his-life-to-a-polish-donut/",
"publication_date": "2018-05-29 14:00:00",
"image_url": "https://blogs-images.forbes.com/robinandrews/files/2018/05/PIA22085large-1200x675.jpg?width=0&height=600",
"language": "en",
"title": "This Is Why Han Solo May Owe His Life To A Polish Donut"
}
]
}
Search news articles based on a specific query for different languages
POST
/news
Returns a list of news articles relevant to the query. POST body should include the query in the JSON format, e.g. { "query" : "Donald Trump"}. The HTTP header should be specified as 'application/json'.
Example URI
- apiKey
string
(required)API Key
- language
string
(optional, default=en)Language code in the ISO 639-1 format
- count
integer
(optional, default=10, maximum value=50)Maximum count of news to return
200
Headers
Content-Type: application/json
Schema
{
"results": [
{
"language": "en",
"title": "Diplomatic duels: What now for the Donald Trump-Kim Jong Un summit?",
"url": "https://economictimes.indiatimes.com/news/defence/diplomatic-duels-what-now-for-the-dinald-trump-kim-jong-un-summit/articleshow/64351498.cms",
"score": 13.17083740234375,
"image_url": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSv0-NPFkf98_pIa9-1aUMeCksBDD7GdPrN4RdWziokhu1kb1yk7EmtyRlozeQgOMT6bqRIq7yr_0U",
"publication_date": "2018-05-28 06:59:00"
},
{
"language": "en",
"title": "US Team In North Korea For Summit Talks, Says Donald Trump",
"url": "https://www.ndtv.com/world-news/us-team-in-north-korea-for-summit-talks-says-donald-trump-1858532",
"score": 12.415493965148926,
"image_url": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTDk0Idr_6tGHP5Ur7U1ZXZxICebGR0K-2kTcWtgJ589b_hLb1BvBIV7dJCbw_wLbgp8oXbyXUPUhU",
"publication_date": "2018-05-28 05:17:44"
},
{
"language": "en",
"title": "Donald Trump Jr is in high political demand – for now",
"url": "https://www.theguardian.com/us-news/2018/may/28/donald-trump-jr-high-demand-conservative-groups-wary",
"score": 12.384574890136719,
"image_url": "https://i.guim.co.uk/img/media/5394c2707b62a7a882047907cf3beab4a5e3d2a5/0_126_4200_2519/master/4200.jpg?w=140&q=55&auto=format&usm=12&fit=max&s=1697b507f5ae8b8f7eda9e3c91929d69",
"publication_date": "2018-05-28 05:00:45"
}
]
}
Fake News Detection
The fake news detection method analyzes news articles to identify whether they are likely to be real news or not. With the help of custom AI classifiers, it can detect different types of fake information, such as propaganda, conspiracy, pseudoscience, bias, irony.
News analysis algorithm uses a wide range of components in order to successfully solve the fake news detection problem: custom machine learning models trained on fake and biased articles, proprietary multi-language summarization technology to extract only important information and remove information noise, historical news data search to check the story relevancy and misleading facts, database of trusted and biased websites created by our experts.
Detect fake news from weblinks
GET
/checkfake
Analyze news content and detect fake news from a given url.
Example URI
- apiKey
string
(required)API Key
- url
string
(required)Article or web page url
200
Headers
Content-Type: application/json
Schema
{
"predictions": [
{
"confidence": 0.36,
"type": "real"
},
{
"confidence": 0.64,
"type": "fake",
"categories": [
{
"confidence": 0.2,
"type": "bias"
},
{
"confidence": 0.1,
"type": "conspiracy"
},
{
"confidence": 0,
"type": "propaganda"
},
{
"confidence": 0.6,
"type": "pseudoscience"
},
{
"confidence": 0.1,
"type": "irony"
}
]
}
]
}
Linguistic Processor
Linguistic processor is the custom natural language processing solution for deep linguistic analysis of unstructured data that supports 39+ languages covering all European, major Asian and Arabic languages. It automatically detects tokens and sentences, identifies parts of speech tags (PoS), lemmas, noun phrases, and extracts semantic relations for each sentence.
Linguistic analysis API performs the following steps of text analysis:
- sentence and word segmentation stage transforms a text to a list of sentences and words with punctuation marks;
- lemmatization stage for canonization of words to their initial forms;
- part-of-speech (POS) tagger annotates each word with a unique part-of-speech tag using Penn Treebank tagset. The part of speech tagger is based on state-of-the-art machine learning algorithms and provides high level of accuracy for different languages;
- chunker transforms the input sequence of tagged words to high-level word structures such as noun phases, verb phrases, etc.;
- semantic relations extraction that automatically extracts semantic relations between detected word chunks such as subject-verb(action)-object relations.
Extract linguistic analysis results from weblinks
GET
/syntax
Extract linguistic analysis results from a given url.
Example URI
- apiKey
string
(required)API Key
- url
string
(required)Article or web page url
- language
string
(optional)Document language in the ISO 639-1 format. If the value for
language
is undefined the document language will be detected automatically
200
Headers
Content-Type: application/json
Schema
[
{
"tokens": [
{
"lemma": "culture",
"tag": "NNP",
"word": "Culture",
"end offset": 7,
"start offset": 0
},
{
"lemma": "minister",
"tag": "NNP",
"word": "Minister",
"end offset": 16,
"start offset": 8
},
{
"lemma": "alberto",
"tag": "NNP",
"word": "Alberto",
"end offset": 24,
"start offset": 17
},
{
"lemma": "bonisoli",
"tag": "NNP",
"word": "Bonisoli",
"end offset": 33,
"start offset": 25
},
{
"lemma": "describe",
"tag": "VBD",
"word": "described",
"end offset": 43,
"start offset": 34
},
{
"lemma": "the",
"tag": "DT",
"word": "the",
"end offset": 47,
"start offset": 44
},
{
"lemma": "finding",
"tag": "NN",
"word": "finding",
"end offset": 55,
"start offset": 48
},
{
"lemma": "as",
"tag": "IN",
"word": "as",
"end offset": 58,
"start offset": 56
},
{
"lemma": "a",
"tag": "DT",
"word": "a",
"end offset": 60,
"start offset": 59
},
{
"lemma": "discovery",
"tag": "NN",
"word": "discovery",
"end offset": 70,
"start offset": 61
},
{
"lemma": "that",
"tag": "WDT",
"word": "that",
"end offset": 75,
"start offset": 71
},
{
"lemma": "fill",
"tag": "VBZ",
"word": "fills",
"end offset": 81,
"start offset": 76
},
{
"lemma": "him",
"tag": "PRP",
"word": "him",
"end offset": 85,
"start offset": 82
},
{
"lemma": "with",
"tag": "IN",
"word": "with",
"end offset": 90,
"start offset": 86
},
{
"lemma": "pride",
"tag": "NN",
"word": "pride",
"end offset": 96,
"start offset": 91
},
{
"lemma": ".",
"tag": ".",
"word": ".",
"end offset": 97,
"start offset": 96
}
],
"chunks": [
{
"chunk": "Culture Minister Alberto Bonisoli",
"start index": 0,
"type": "NP",
"chunk head": "Bonisoli",
"end index": 4
},
{
"chunk": "described",
"start index": 4,
"type": "VP",
"chunk head": "described",
"end index": 5
},
{
"chunk": "the finding",
"start index": 5,
"type": "NP",
"chunk head": "finding",
"end index": 7
},
{
"chunk": "a discovery",
"start index": 8,
"type": "NP",
"chunk head": "discovery",
"end index": 10
},
{
"chunk": "fills",
"start index": 11,
"type": "VP",
"chunk head": "fills",
"end index": 12
},
{
"chunk": "him",
"start index": 12,
"type": "NP",
"chunk head": "him",
"end index": 13
},
{
"chunk": "pride",
"start index": 14,
"type": "NP",
"chunk head": "pride",
"end index": 15
}
],
"relations": [
{
"verb": {
"phrase": "described",
"start index": 4,
"end index": 5
},
"object": {
"phrase": "the finding",
"start index": 5,
"end index": 7
},
"subject": {
"phrase": "Culture Minister Alberto Bonisoli",
"start index": 0,
"end index": 4
}
}
],
"sentence": "Culture Minister Alberto Bonisoli described the finding as a discovery that fills him with pride."
}
]
Extract linguistic analysis results from binary data
POST
/syntax
Extract linguistic analysis results from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.
Example URI
- apiKey
string
(required)API Key
- filename
string
(required)Name of the file, e.g. filename=1.pdf
- language
string
(optional)Document language in the ISO 639-1 format. If the value for
language
is undefined the document language will be detected automatically
200
Headers
Content-Type: application/json
Schema
[
{
"tokens": [
{
"lemma": "culture",
"tag": "NNP",
"word": "Culture",
"end offset": 7,
"start offset": 0
},
{
"lemma": "minister",
"tag": "NNP",
"word": "Minister",
"end offset": 16,
"start offset": 8
},
{
"lemma": "alberto",
"tag": "NNP",
"word": "Alberto",
"end offset": 24,
"start offset": 17
},
{
"lemma": "bonisoli",
"tag": "NNP",
"word": "Bonisoli",
"end offset": 33,
"start offset": 25
},
{
"lemma": "describe",
"tag": "VBD",
"word": "described",
"end offset": 43,
"start offset": 34
},
{
"lemma": "the",
"tag": "DT",
"word": "the",
"end offset": 47,
"start offset": 44
},
{
"lemma": "finding",
"tag": "NN",
"word": "finding",
"end offset": 55,
"start offset": 48
},
{
"lemma": "as",
"tag": "IN",
"word": "as",
"end offset": 58,
"start offset": 56
},
{
"lemma": "a",
"tag": "DT",
"word": "a",
"end offset": 60,
"start offset": 59
},
{
"lemma": "discovery",
"tag": "NN",
"word": "discovery",
"end offset": 70,
"start offset": 61
},
{
"lemma": "that",
"tag": "WDT",
"word": "that",
"end offset": 75,
"start offset": 71
},
{
"lemma": "fill",
"tag": "VBZ",
"word": "fills",
"end offset": 81,
"start offset": 76
},
{
"lemma": "him",
"tag": "PRP",
"word": "him",
"end offset": 85,
"start offset": 82
},
{
"lemma": "with",
"tag": "IN",
"word": "with",
"end offset": 90,
"start offset": 86
},
{
"lemma": "pride",
"tag": "NN",
"word": "pride",
"end offset": 96,
"start offset": 91
},
{
"lemma": ".",
"tag": ".",
"word": ".",
"end offset": 97,
"start offset": 96
}
],
"chunks": [
{
"chunk": "Culture Minister Alberto Bonisoli",
"start index": 0,
"type": "NP",
"chunk head": "Bonisoli",
"end index": 4
},
{
"chunk": "described",
"start index": 4,
"type": "VP",
"chunk head": "described",
"end index": 5
},
{
"chunk": "the finding",
"start index": 5,
"type": "NP",
"chunk head": "finding",
"end index": 7
},
{
"chunk": "a discovery",
"start index": 8,
"type": "NP",
"chunk head": "discovery",
"end index": 10
},
{
"chunk": "fills",
"start index": 11,
"type": "VP",
"chunk head": "fills",
"end index": 12
},
{
"chunk": "him",
"start index": 12,
"type": "NP",
"chunk head": "him",
"end index": 13
},
{
"chunk": "pride",
"start index": 14,
"type": "NP",
"chunk head": "pride",
"end index": 15
}
],
"relations": [
{
"verb": {
"phrase": "described",
"start index": 4,
"end index": 5
},
"object": {
"phrase": "the finding",
"start index": 5,
"end index": 7
},
"subject": {
"phrase": "Culture Minister Alberto Bonisoli",
"start index": 0,
"end index": 4
}
}
],
"sentence": "Culture Minister Alberto Bonisoli described the finding as a discovery that fills him with pride."
}
]
Intent Analysis
The intent analysis method automatically classifies search keywords according to the user search intention. The method identifies the following search intent categories: Transactional, Commercial (Opinion/Quality), Commercial (Comparison), Commercial (Reviews/Complain), Informational, Navigational.
Most available solutions on the market identify only three categories of search intents: Transactional, Navigational and Informational. But marketers need more details to make the right decisions. That's why our intent analysis API can identify six categories instead of three categories.
We’ve implemented custom artificial intelligence classifier which is based on semantic features extracted from search keywords. Unlike other competitors we've trained the feature-based classification model based on the output from our multilingual Linguistic Processor. Using the output of semantic oriented Linguistic Processor as input for machine learning algorithms helped us to significantly increase the intent classification accuracy. Our feature-based AI classifier uses a wide range of linguistic (part of speech tags, lemmas), syntactical (lexical chunks), semantic (action-verb relations) and expert features (intent important keywords and patterns: products, brand names, action words, sentiment/opinion keywords, specific keyword structures, etc.). It supports 35+ languages and has intent classification accuracy from 80% to 96% depending on the language.
Detect search intent of a keyword
POST
/intents
Detect intent of search keywords. POST body should include JSON data in the following format: { "keywords" : [ { "keyword" : "search keyword", "id" : "1" }, { "keyword" : "search keyword", "id" : "2" }, ... ]}, where "keyword" - keyword text, "id" - unique identifier. Maximum keywords count per one request is 100. The HTTP header should be specified as 'application/json'.
Example URI
- apiKey
string
(required)API Key
- language
string
(optional)Keywords language in the ISO 639-1 format. If the value for
language
is undefined the language of each keyword will be detected automatically with the help of our short text language identification method
200
Headers
Content-Type: application/json
Schema
{
"keywords": [
{
"category" : "Transactional",
"confidence" : 0.75,
"keyword" : "online shopping clothes pakistan",
"id" : "1",
"language" : "en"
},
{
"category" : "Commercial>Comparison",
"confidence" : 0.9,
"keyword" : "what is the best shampoo",
"id" : "2",
"language" : "en"
}
]
}
Named Entity Recognition
The named entity extraction method automatically detects persons, companies, locations, organizations, adresses, phone numbers, emails, currencies, credit card numbers and other various type of entities in any type of text. It supports URL (GET) and files (POST) processing endpoints.
Our named entity detection algorithm combines deep neural network models with linguistic rules optimized for identification of entities in documents. It supports 20+ different languages and covers major European and Asian languages.
Extract named entities from weblinks
GET
/entities
Extract named entities from a given url.
Example URI
- apiKey
string
(required)API Key
- url
string
(required)Article or web page url
- language
string
(optional)Document language in the ISO 639-1 format. If the value for
language
is undefined the document language will be detected automatically
200
Headers
Content-Type: application/json
Schema
{
"entities": {
"persons" : [
{
"entity" : "Adam Mount",
"offsets" : [
{
"start" : 55,
"end": 65
},
{
"start" : 223,
"end" : 226
}
]
},
{
"entity" : "Trump",
"offsets" : [
{
"start" : 0,
"end": 5
}
{
"start" : 1445,
"end" : 1450
}
{
"start" : 2658,
"end" : 2663
}
]
}
],
"locations" : [
{
"entity" : "Seoul",
"offsets" : [
{
"start" : 3942,
"end" : 3947
},
{
"start" : 4144,
"end" : 4149
}
]
}
]
}
}
Extract named entities from binary data
POST
/entities
Extract named entities from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.
Example URI
- apiKey
string
(required)API Key
- filename
string
(required)Name of the file, e.g. filename=1.pdf
- language
string
(optional)Document language in the ISO 639-1 format. If the value for
language
is undefined the document language will be detected automatically
200
Headers
Content-Type: application/json
Schema
{
"entities": {
"persons" : [
{
"entity" : "Adam Mount",
"offsets" : [
{
"start" : 55,
"end": 65
},
{
"start" : 223,
"end" : 226
}
]
},
{
"entity" : "Trump",
"offsets" : [
{
"start" : 0,
"end": 5
}
{
"start" : 1445,
"end" : 1450
}
{
"start" : 2658,
"end" : 2663
}
]
}
],
"locations" : [
{
"entity" : "Seoul",
"offsets" : [
{
"start" : 3942,
"end" : 3947
},
{
"start" : 4144,
"end" : 4149
}
]
}
]
}
}
Keywords Extraction
The keywords extraction method automatically extracts the most important keywords from weblinks, documents, audio files and images.
Caution
The language of text documents will be detect automatically. For audio files and images it should be specified for each request. If the value for language
is undefined, then the default language for audio and image processing will be set to English
.
Extract keywords from weblinks
GET
/keywords
Extract keywords from a given url.
Example URI
- apiKey
string
(required)API Key
- url
string
(required)Article or web page url
- keywords
integer
(optional, default = 10)Maximum count of keywords to return
- language
string
(optional for text files, required for audio files and images)A language of text files will be detected automatically. For audio files it should be specified from the list of supported languages, e.g. language=German.
200
Headers
Content-Type: application/json
Schema
{
"keywords" : [
{
"keyword" : "artificial intelligence",
"weight" : 0.87,
"ids" : [
1,
6
]
},
{
"keyword" : "machines",
"weight" : 0.71,
"ids" : [
0,
4
]
}
]
}
Extract keywords from binary data
POST
/keywords
Extract keywords from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.
Example URI
- apiKey
string
(required)API Key
- filename
string
(required)Name of the file, e.g. filename=1.pdf
- keywords
integer
(optional, default = 10)Maximum count of keywords to return
- language
string
(optional for text files, required for audio files and images)A language of text files will be detected automatically. For audio files it should be specified from the list of supported languages, e.g. language=German.
200
Headers
Content-Type: application/json
Schema
{
"keywords" : [
{
"keyword" : "artificial intelligence",
"weight" : 0.87,
"ids" : [
1,
6
]
},
{
"keyword" : "machines",
"weight" : 0.71,
"ids" : [
0,
4
]
}
]
}
Article Extraction
The article extraction method is used to extract clean article text from a file that you provide to the API. For hypertext documents it also identifies different metadata such as title, main article image, publish date, author, meta description, etc.
Caution
The article extraction method can handle only text files and scanned documents (e.g. PDF files with images).
Extract plain article text and metadata from weblinks
GET
/extract
Extract article text and metadata from a given url.
Example URI
- apiKey
string
(required)API Key
- url
string
(required)Article or web page url
- language
string
(optional for text files, required for scanned documents)For scanned documents (e.g. PDF files with images) it should be specified from the list of supported languages, e.g. language=German.
- isocr
boolean
(optional, default = false)use optical character recognition for PDF documents processing. If isocr is set to true, the document language should be specified from the list of supported languages, e.g. language=English (see the Language Support section for more details).
200
Headers
Content-Type: application/json
Schema
{
"text": "(CNN) Night has fallen on a toe-numbing English winter's day.
In a manor house, where spirits of aristocrats are rumored to roam
ancient hallways, are some of England's finest young athletes.\n\nIn a
dimly lit, oak-paneled room at Bisham Abbey, 30 miles west of London,
these 18 to twentysomethings have gathered for another chapter in
their learning.\n\n A grand-looking Victorian lady, framed in gold,
peers down on the assembled players and coaches. On these same dark walls
hang the works of Raphael.",
"article title": "How to build a rugby player -- Inside England's
Under-20s camp",
"meta information": {
"meta description": "England's Under-20s give CNN Sport exclusive access
as they prepare for the Under-20 Six Nations, a championship they have
won six times in 10 years.",
"publish date": "2018-02-03T10:28:00Z",
"image": "https://cdn.cnn.com/cnnnext/dam/assets/
180129105453-owen-farrell-super-tease.jpg",
"authors": [
"Aimee Lewis"
],
"meta keywords": "sport, Six Nations 2018, training camp"
}
}
Extract plain article text and metadata from binary data
POST
/extract
Extract article text and metadata from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.
Example URI
- apiKey
string
(required)API Key
- filename
string
(required)Name of the file, e.g. filename=1.html
- language
string
(optional for text files, required for scanned documents)For scanned documents (e.g. PDF files with images) it should be specified from the list of supported languages, e.g. language=German.
- isocr
boolean
(optional, default = false)use optical character recognition for PDF documents processing. If isocr is set to true, the document language should be specified from the list of supported languages, e.g. language=English (see the Language Support section for more details).
200
Headers
Content-Type: application/json
Schema
{
"text": "(CNN) Night has fallen on a toe-numbing English winter's day.
In a manor house, where spirits of aristocrats are rumored to roam
ancient hallways, are some of England's finest young athletes.\n\nIn a
dimly lit, oak-paneled room at Bisham Abbey, 30 miles west of London,
these 18 to twentysomethings have gathered for another chapter in
their learning.\n\n A grand-looking Victorian lady, framed in gold,
peers down on the assembled players and coaches. On these same dark walls
hang the works of Raphael.",
"article title": "How to build a rugby player -- Inside England's
Under-20s camp",
"meta information": {
"meta description": "England's Under-20s give CNN Sport exclusive access
as they prepare for the Under-20 Six Nations, a championship they have
won six times in 10 years.",
"publish date": "2018-02-03T10:28:00Z",
"image": "https://cdn.cnn.com/cnnnext/dam/assets/
180129105453-owen-farrell-super-tease.jpg",
"authors": [
"Aimee Lewis"
],
"meta keywords": "sport, Six Nations 2018, training camp"
}
}
Short Text Language Detection
The short text language detection method analyzes a short piece of text (search keywords, user messages, tweets, etc.) and accurately recognizes the language of the small text. The method returns the language code conform to ISO 639-1 identifiers.
Most of language detection solutions work well on fulltext documents, but lack on short texts, especially on search keywords, tweets, user messages in chats, etc. Short texts are too short to extract their N-gram features properly, they use «unnatural» language, have misspellings and often contain words written in multiply languages.
For short text language identification we’ve implemented optimized version of support-vector machines classifier (SVM). Our classification algorithm takes into account a lot of specific features of short texts, supports 70+ different languages and has language detection accuracy on small messages from 91% to 98% depending on the language.
Detect language of a short text
POST
/shortlang
Detect language of short texts. POST body should include JSON data in the following format: { "documents" : [ { "text" : "short message text", "id" : "1" }, { "text" : "short message text", "id" : "2" }, ... ]}, where "text" - short text, "id" - unique identifier. Maximum short texts count per one request is 100. The HTTP header should be specified as 'application/json'.
Example URI
- apiKey
string
(required)API Key
200
Headers
Content-Type: application/json
Schema
{
"documents": [
{
"text" : "subaru xv prix",
"id" : "1",
"language" : "fr"
},
{
"text" : "vendita appartamento lago maggiore",
"id" : "2",
"language" : "it"
}
]
}
Language Detection
The language detection method analyzes a fulltext document that you provide and recognizes the language of the text. The method returns the language code conform to ISO 639-1 identifiers.
Detect language of a text from weblinks
GET
/language
Detect text language from a given url.
Example URI
- apiKey
string
(required)API Key
- url
string
(required)Article or web page url
200
Headers
Content-Type: application/json
Schema
{
"language": "en"
}
Detect language of a text from binary data
POST
/language
Detect text language from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.
Example URI
- apiKey
string
(required)API Key
- filename
string
(required)Name of the file, e.g. filename=1.html
200
Headers
Content-Type: application/json
Schema
{
"language": "en"
}
Face Detection
The face detection method analyzes an image file to find faces. The method returns a list of items, each of which contains the coordinates of a face that was detected in the file.
Caution
The face detection method processes only image files (.jpeg, .png, etc.).
Detect faces from image weblinks
GET
/faces
Detect faces from a given image url.
Example URI
- apiKey
string
(required)API Key
- url
string
(required)Image url
200
Headers
Content-Type: application/json
Schema
{
"faces": [
{
"y": "371",
"x": "370",
"height": "137",
"width": "137"
},
{
"y": "190",
"x": "474",
"height": "149",
"width": "149"
},
{
"y": "210",
"x": "598",
"height": "155",
"width": "155"
},
{
"y": "399",
"x": "706",
"height": "146",
"width": "146"
}
]
}
Detect faces from image binary data
POST
/faces
Detect faces from image binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.
Example URI
- apiKey
string
(required)API Key
200
Headers
Content-Type: application/json
Schema
{
"faces": [
{
"y": "371",
"x": "370",
"height": "137",
"width": "137"
},
{
"y": "190",
"x": "474",
"height": "149",
"width": "149"
},
{
"y": "210",
"x": "598",
"height": "155",
"width": "155"
},
{
"y": "399",
"x": "706",
"height": "146",
"width": "146"
}
]
}
Image Recognition
The image recognition method classifies the contents of an entire image into thousands of categories (e.g., "basketball", "lion", "shark"). It returns a list of tags (labels) for an image along with a confidence score which indicates how confident the system is about the assignment.
Recognize an image content from weblinks
GET
/images
Image recognition from a given url.
Example URI
- apiKey
string
(required)API Key
- url
string
(required)Image url
- tags
string
(optional, default = 5)Maximum count of image tags to return
200
Headers
Content-Type: application/json
Schema
{
"tags": [
{
"confidence": 0.9,
"name": "great white shark, white shark"
},
{
"confidence": 0.05,
"name": "tiger shark"
},
{
"confidence": 0.03,
"name": "killer whale"
}
]
}
Recognize an image content from binary data
POST
/images
Image recognition from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.
Example URI
- apiKey
string
(required)API Key
- filename
string
(required)Name of the file, e.g. filename=1.jpg
- tags
string
(optional, default = 5)Maximum count of image tags to return
200
Headers
Content-Type: application/json
Schema
{
"tags": [
{
"confidence": 0.9,
"name": "great white shark, white shark"
},
{
"confidence": 0.05,
"name": "tiger shark"
},
{
"confidence": 0.03,
"name": "killer whale"
}
]
}
Stemming
The stemming method automatically reduces inflected words to their base or root form and removes stop words from text documents. It supports English, French, German, Spanish, Italian, Russian, Swedish, Danish, Finnish, Dutch, Hungarian, Norwegian, Portuguese and Romanian languages.
Stem and remove stop words from text data
POST
/stem
Stem and remove stop words from text data. POST body should include the text in the JSON format, e.g. { "text" : "document text"}. The HTTP header should be specified as 'application/json'. The language of text documents will be detected automatically.
Example URI
- apiKey
string
(required)API Key
200
Headers
Content-Type: application/json
Schema
{
"stemmed": "lawyer post video sign languag danger ponzi
scheme post went viral hundr deaf peopl got touch
legal troubl fraud domest violenc uncov huge communiti
need help tang shuai simpli tri improv legal knowledg among deaf
communiti post video china wechat messag app februari instant
hit mr tang flood mani friend request ask wechat boost friend
limit 5,000 10,000 strike chord answer goe way beyond legal
difficulti complex world sign languag china",
"language": "en"
}
Comments Extraction
The comments extraction method automatically structures and extracts reviews and comments from web pages.
Extract comments from weblinks
GET
/comments
Extract comments from a given url.
Example URI
- apiKey
string
(required)API Key
- url
string
(required)Article or web page url
200
Headers
Content-Type: application/json
Schema
{
"comments": [
"Well, the Hotel is very central, perfect
for shopping, sightseeing or nightlife.
Friendly welcome on arrival, a complimentary
birthday drink brought to us in the comfy lounge area.",
"Would definately stay at this hotel again
and recommended this to others.",
"Cleanliness of bedrooms is always very high.
Complimentary breakfast is a welcome feature.",
"Nice room on the second floor at the far end of the hall.
Very quiet room. Comfortable bed, nice shower
with hot, hot water.",
"the amazing breakfast! I cannot find a fault
with 5his new hotel it competes and is
better than most high end expensive hotels in the city!"
]
}
Extract comments from binary data
POST
/comments
Extract comments from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.
Example URI
- apiKey
string
(required)API Key
- filename
string
(required)Name of the file, e.g. filename=1.html
200
Headers
Content-Type: application/json
Schema
{
"comments": [
"Well, the Hotel is very central, perfect
for shopping, sightseeing or nightlife.
Friendly welcome on arrival, a complimentary
birthday drink brought to us in the comfy lounge area.",
"Would definately stay at this hotel again
and recommended this to others.",
"Cleanliness of bedrooms is always very high.
Complimentary breakfast is a welcome feature.",
"Nice room on the second floor at the far end of the hall.
Very quiet room. Comfortable bed, nice shower
with hot, hot water.",
"the amazing breakfast! I cannot find a fault
with 5his new hotel it competes and is
better than most high end expensive hotels in the city!"
]
}
Video Identification
The video identification method automatically extracts detailed video information from hypertext pages: direct video url, video provider, video width and height.
Caution
The video identification method processes only hypertext files (.html, .xml, etc.).
Extract information about videos from weblinks
GET
/video
Extract video information from a given url.
Example URI
- apiKey
string
(required)API Key
- url
string
(required)Web page url
200
Headers
Content-Type: application/json
Schema
{
"video": [
{
"source": "https://www.youtube.com/embed/YqB50JG2aIE",
"height": "70%",
"width": "100%",
"provider": "youtube"
},
{
"source": "https://www.youtube.com/embed/XYPE7rZkYRg",
"height": null,
"width": "100%",
"provider": "youtube"
}
]
}
Extract information about videos from binary data
POST
/video
Extract video information from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.
Example URI
- apiKey
string
(required)API Key
- filename
string
(required)Name of the file, e.g. filename=1.html
200
Headers
Content-Type: application/json
Schema
{
"video": [
{
"source": "https://www.youtube.com/embed/YqB50JG2aIE",
"height": "70%",
"width": "100%",
"provider": "youtube"
},
{
"source": "https://www.youtube.com/embed/XYPE7rZkYRg",
"height": null,
"width": "100%",
"provider": "youtube"
}
]
}