Text and Image Analysis API Documentation | SummarizeBot

API Documentation

Our REST API is a package of artificial intelligence and blockchain-powered solutions for analyzing and extracting various kinds of information from unstructured text data, videos and images.

This documentation allows you to start working with the API and provides you information about the API methods and options.

Endpoint

The main endpoint for all API calls:

https://www.summarizebot.com/api/

API Key

To use our API you will need an API key. Please, register to get your personal API key for 14 days trial period.

You should add your API key as a parameter for every request sent to our API:

[main endpoint]/[method]?apiKey=[api key]

Get Started

Once you have your personal API key, you can use the API in the following way:

  • Select the API method you are interested in from this documentation

  • Send HTTP GET or POST requests to the main endpoint, e.g. for a document summarization call the full URL would be:

    https://www.summarizebot.com/api/summarize?[options]

  • Also you can test-drive our API methods by importing the Postman Collection below. This is a quick and easy way to become more familiar with the SummarizeBot API and how it works

Usage Examples

URLs Processing

You can use the following Python code to process weblinks:

import requests
                    
# API URL
# You can change 'summarize' to different endpoints: sentiment, keywords, etc.
api_url = "https://www.summarizebot.com/api/summarize?apiKey=YOUR_API_KEY&size=20&keywords=10&fragments=15&url=URL_FOR_PROCESSING"
r = requests.get(api_url)
json_res = r.json()
print json_res

cURL request:

curl -X GET "https://www.summarizebot.com/api/summarize?apiKey=YOUR_API_KEY&size=20&keywords=10&fragments=15&url=URL_FOR_PROCESSING"

Files Processing

To process files you can use our POST API endpoints. POST body should be specified as 'application/octet-stream' and include file content in binary form. In Python you can use the following code:

import requests

# Read binary data from the file
with open('test.txt', mode='rb') as file:
    post_body = file.read()

# API URL
# You can change 'summarize' to different endpoints: sentiment, keywords, etc.
api_url = "https://www.summarizebot.com/api/summarize?apiKey=your_API_key&size=20&keywords=10&fragments=15&filename=test.txt"
# HTTP header
header = {'Content-Type': "application/octet-stream"}
r = requests.post(api_url, headers = header, data = post_body)
json_res = r.json()
print json_res

cURL request:

curl -H "Content-Type:application/octet-stream" --data-binary @test.txt https://www.summarizebot.com/api/summarize?apiKey=your_API_key&size=20&keywords=10&fragments=15&filename=test.txt

Plain Text Processing

To process text strings you need to represent them as binary data (bytes) and send bytes as POST body in POST requests. In Python you can use the following code:

import requests

# Text for processing in UTF-8 encoding
text_for_processing = u"Planet has only until 2030 to stem catastrophic climate change, experts warn."
# Create bytes representation of the text
post_body = bytes(text_for_processing.encode('utf-8'))

# API URL
# You can change 'summarize' to different endpoints: sentiment, keywords, etc.
api_url = "https://www.summarizebot.com/api/summarize?apiKey=your_API_key&size=20&keywords=10&fragments=15&filename=1.txt"
# HTTP header
header = {'Content-Type': "application/octet-stream"}
r = requests.post(api_url, headers = header, data = post_body)
json_res = r.json()
print json_res

cURL request:

curl -H "Content-Type:application/octet-stream" --data "Planet has only until 2030 to stem catastrophic climate change, experts warn." https://www.summarizebot.com/api/summarize?apiKey=your_API_key&size=20&keywords=10&fragments=15&filename=1.txt

Error Codes

The API methods may return the following errors:

  • 400 - bad request

  • 401 - API key is invalid or expired

  • 402 - maximum file size limit is exceeded

  • 403 - http header isn't specified as 'application/octet-stream'

  • 404 - http header isn't specified as 'application/json'

  • 429 - too many requests (rate limit exceeds)

  • 500 - internal server error

Language Support

Document summarization and keywords extraction features are available for almost every language including English, Chinese, Russian, Japanese, Arabic, German, Spanish, French, Portuguese, etc. Please see full list here.

Sentiment analysis method supports English, French, German, Italian, Portuguese, Spanish and Russian languages.

Fake news detection method supports English language only.

For audio recognition the API supports the following languages: English, Russian, Chinese, French, German, Italian, Spanish, Japanese, Swedish, Finnish, Arabic.

For text extraction from images our API supports the following languages: English, Latvian, French, German, Russian, Italian, Dutch, Spanish, Portuguese, Swedish, Finnish.

File Formats

The text analysis API methods support most of the text, image and audio formats: .html, .pdf, .doc, .docx, .csv, .eml, .epub, .gif, .jpg, .jpeg, .mp3, .msg, .odt, .ogg, .png, .pptx, .ps, .rtf, .tiff, .tif, .txt, .wav, .xlsx, .xls, .psv, .tsv, .tff, .aif, .aiff, .avr, .cdr, .wv, .au, .flac, .snd, .vox.

The article extraction and language detection methods can only process text files.

The video identification and comments extraction features deal only with hypertext files (.html, .xml, etc.).

Summarization

The summarization method automatically extracts the most important information, keywords and keyphrases from weblinks, documents, audio files and images. With the help of summarization API you can create general or topic-oriented summaries for different domains. Just add 'domain' option with specific parameter in your request and the output summary will consist of the sentences, which are mostly relevant to a given domain.

Supported Domains

Summarization API supports the following domains: accounting, agriculture, art, automotive, beauty, business, construction, culture, demographics, economics, education, electronics, energy, environment, european_union, finance, fisheries, foods, forestry, gardening, geography, healthcare, human_resources, industries, insurance, intellectual_property, international_organizations, international_relations, investments, it, legal, literature, management, marketing, parliament, pets, politics, production, religion, science, social_issues, sports, taxes, technology, trade, transportation_and_cargo, travel, weather .

Caution

The language of text documents will be detected automatically. For audio files and images it should be specified for each request. If the value for language is undefined, then the default language for audio and image processing will be set to English.

Create a summary from weblinks
GET /summarize

Summarize file from a given url.

Example URI

GET  https://www.summarizebot.com/api/summarize
URI Parameters
Hide Show
apiKey
string (required) 

API Key

size
integer (optional, default = 16) 

Summary length as percentage of original document

url
string (required) 

Article or web page url

keywords
integer (optional, default = 10) 

Maximum count of keywords to return

fragments
integer (optional, default = 15) 

Maximum count of key fragments to return

domain
string (optional) 

Domain identifier for topic-oriented summarization

language
string (optional for text files, required for audio files and images) 

A language of text files will be detected automatically. For audio files it should be specified from the list of supported languages, e.g. language=German.

Response  200
Hide Show
Headers
Content-Type: application/json
Schema
[  
   {  
      "summary" : [
         {  
            "id" : 0,
            "weight" : 2.43,
            "sentence" : "Artificial intelligence (AI, also machine intelligence, 
             MI) is intelligence displayed by machines, in contrast with the 
             natural intelligence (NI) displayed by humans and other animals."
         },
         {  
            "id" : 1,
            "weight" : 2.04,
            "sentence" : "AI research is defined as the study of \\"intelligent
             agents\\": any device that perceives its environment and takes 
             actions that maximize its chance of success at some goal."
         }
      ]
   },
   {  
      "keywords" : [  
         {  
            "keyword" : "artificial intelligence",
            "weight" : 0.87,
            "ids" : [
               1,
               6
            ]
         },
         {  
            "keyword" : "machines",
            "weight" : 0.71,
            "ids" : [  
               0,
               4
            ]
         }
      ]
   },
   {  
      "fragments" : [  
         {  
            "fragment" : "optical character recognition",
            "ids" : [  
               5
            ],
            "weight" : 0.15
         }
      ]
   }
]

Create a summary from binary data
POST /summarize

Summarize file from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.

Example URI

POST https://www.summarizebot.com/api/summarize
URI Parameters
Hide Show
apiKey
string (required) 

API Key

size
integer (optional, default = 16) 

Summary length as percentage of original document

filename
string (required) 

Name of the file, e.g. filename=1.pdf

keywords
integer (optional, default = 10) 

Maximum count of keywords to return

fragments
integer (optional, default = 15) 

Maximum count of key fragments to return

domain
string (optional) 

Domain identifier for topic-oriented summarization

language
string (optional for text files, required for audio files and images) 

A language of text files will be detected automatically. For audio files it should be specified from the list of supported languages, e.g. language=German.

Request  200
Hide Show
Headers
Content-Type: application/json
Schema
[  
   {  
      "summary" : [
         {  
            "id" : 0,
            "weight" : 2.43,
            "sentence" : "Artificial intelligence (AI, also machine intelligence, 
             MI) is intelligence displayed by machines, in contrast with the 
             natural intelligence (NI) displayed by humans and other animals."
         },
         {  
            "id" : 1,
            "weight" : 2.04,
            "sentence" : "AI research is defined as the study of \\"intelligent
             agents\\": any device that perceives its environment and takes 
             actions that maximize its chance of success at some goal."
         }
      ]
   },
   {  
      "keywords" : [  
         {  
            "keyword" : "artificial intelligence",
            "weight" : 0.87,
            "ids" : [
               1,
               6
            ]
         },
         {  
            "keyword" : "machines",
            "weight" : 0.71,
            "ids" : [  
               0,
               4
            ]
         }
      ]
   },
   {  
      "fragments" : [  
         {  
            "fragment" : "optical character recognition",
            "ids" : [  
               5
            ],
            "weight" : 0.15
         }
      ]
   }
]

Sentiment Analysis

The sentiment analysis method analyzes text to return the sentiment as positive, negative or neutral. Additionally it provides an overall score of the aggregate sentiment for the entire text and a list of aspects that are mentioned in a document (negative or positive words and phrases).

Sentiment analysis API identifies user sentiment not only on document-level, but also detects sentence-level and object-level sentiment. With the help of sentiment analysis API you can correctly detect concrete sentiment objects and opinion phrases and understand the meaning of user reviews.

Caution

The sentiment analysis method is available for English, French, German, Italian, Portuguese, Spanish and Russian languages.

Analyze sentiment from weblinks
GET /sentiment

Analyze text for positive or negative sentiment from a given url.

Example URI

GET  https://www.summarizebot.com/api/sentiment
URI Parameters
Hide Show
apiKey
string (required) 

API Key

url
string (required) 

Article or web page url

language
string (optional) 

Document language in the ISO 639-1 format. If the value for language is undefined the document language will be detected automatically

Response  200
Hide Show
Headers
Content-Type: application/json
Schema
[
    {
        "document sentiment": {
            "polarity": "negative",
            "weight": -1.99
        }
    },
    {
        "sentiment aspects": [
            {
                "features": [
                    {
                        "polarity": "negative",
                        "weight": -0.5,
                        "sentiment object": {
                            "start offset": 0,
                            "object": "The burger",
                            "end offset": 10
                        },
                        "end offset": 28,
                        "start offset": 15,
                        "phrase": "uncooked , raw"
                    },
                    {
                        "polarity": "negative",
                        "phrase": "left",
                        "end offset": 38,
                        "weight": -0.56,
                        "start offset": 34
                    },
                    {
                        "polarity": "negative",
                        "weight": -0.64,
                        "sentiment object": {
                            "start offset": 76,
                            "object": "person",
                            "end offset": 82
                        },
                        "end offset": 75,
                        "start offset": 71,
                        "phrase": "poor"
                    },
                    {
                        "polarity": "negative",
                        "phrase": "be severely poisoned",
                        "end offset": 114,
                        "weight": -0.5,
                        "start offset": 94
                    }
                ],
                "sentence": "The burger was uncooked, raw, but left out in the sun waiting for some poor person to eat and be severely poisoned."
            }
        ]
    }
]

Analyze sentiment from binary data
POST /sentiment

Analyze text for positive or negative sentiment from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.

Example URI

POST https://www.summarizebot.com/api/sentiment
URI Parameters
Hide Show
apiKey
string (required) 

API Key

filename
string (required) 

Name of the file, e.g. filename=1.html

language
string (optional) 

Document language in the ISO 639-1 format. If the value for language is undefined the document language will be detected automatically

Request  200
Hide Show
Headers
Content-Type: application/json
Schema
[
    {
        "document sentiment": {
            "polarity": "negative",
            "weight": -1.99
        }
    },
    {
        "sentiment aspects": [
            {
                "features": [
                    {
                        "polarity": "negative",
                        "weight": -0.5,
                        "sentiment object": {
                            "start offset": 0,
                            "object": "The burger",
                            "end offset": 10
                        },
                        "end offset": 28,
                        "start offset": 15,
                        "phrase": "uncooked , raw"
                    },
                    {
                        "polarity": "negative",
                        "phrase": "left",
                        "end offset": 38,
                        "weight": -0.56,
                        "start offset": 34
                    },
                    {
                        "polarity": "negative",
                        "weight": -0.64,
                        "sentiment object": {
                            "start offset": 76,
                            "object": "person",
                            "end offset": 82
                        },
                        "end offset": 75,
                        "start offset": 71,
                        "phrase": "poor"
                    },
                    {
                        "polarity": "negative",
                        "phrase": "be severely poisoned",
                        "end offset": 114,
                        "weight": -0.5,
                        "start offset": 94
                    }
                ],
                "sentence": "The burger was uncooked, raw, but left out in the sun waiting for some poor person to eat and be severely poisoned."
            }
        ]
    }
]

News Aggregation

The news aggregation method returns news headlines and searches for articles from over 50,000 sources. Retrieval results include details like main image of the news article, article title and direct url, publication date, and relevancy score to search request.

News API endpoints support 100+ languages, that are specified in the ISO 639-1 format.

Thousands of news sources has been indexed and analyzed by our custom artificial intelligence modules to give the perfect search accuracy in natural language mode.

Return latest news for a specific language
GET /news

Return live and top news for different languages.

Example URI

GET  https://www.summarizebot.com/api/news
URI Parameters
Hide Show
apiKey
string (required) 

API Key

language
string (optional, default=en) 

Language code in the ISO 639-1 format

count
integer (optional, default=10, maximum value=50) 

Maximum count of news to return

Response  200
Hide Show
Headers
Content-Type: application/json
Schema
{
  "results": [
    {
      "url": "https://www.theaustralian.com.au/sport/cricket/jaques-was-last-man-standing-but-a-nsw-pedigree-hard-to-go-past/news-story/86a3ed596aa5766bfb562f912dfa227e",
      "publication_date": "2018-05-29 14:05:46",
      "image_url": "https://encrypted-tbn1.gstatic.com/images?q=tbn:ANd9GcR-v0h1BL_w2ILuDVC07L926nHGIIxb8bGWYdwZAh8K6UJsu-DqnTJ7b9Z1cFLZRQqWHjGPXrNInQ",
      "language": "en",
      "title": "Jaques was last man standing but a NSW pedigree hard to go past"
      },
    {
      "url": "https://gulfnews.com/sport/uae/football/own-goal-sinks-defending-champions-al-taher-1.2228752",
      "publication_date": "2018-05-29 14:00:50",
      "image_url": "https://static.gulfnews.com/polopoly_fs/1.2228830!/image/4040701382.jpg_gen/derivatives/box_460346/4040701382.jpg",
      "language": "en",
      "title": "Own goal sinks defending champions Al Taher"
      },
    {
      "url": "https://www.forbes.com/sites/robinandrews/2018/05/29/this-is-why-han-solo-may-owe-his-life-to-a-polish-donut/",
      "publication_date": "2018-05-29 14:00:00",
      "image_url": "https://blogs-images.forbes.com/robinandrews/files/2018/05/PIA22085large-1200x675.jpg?width=0&height=600",
      "language": "en",
      "title": "This Is Why Han Solo May Owe His Life To A Polish Donut"
      }
  ]
}

Search news articles based on a specific query for different languages
POST /news

Returns a list of news articles relevant to the query. POST body should include the query in the JSON format, e.g. { "query" : "Donald Trump"}. The HTTP header should be specified as 'application/json'.

Example URI

POST https://www.summarizebot.com/api/news
URI Parameters
Hide Show
apiKey
string (required) 

API Key

language
string (optional, default=en) 

Language code in the ISO 639-1 format

count
integer (optional, default=10, maximum value=50) 

Maximum count of news to return

Request  200
Hide Show
Headers
Content-Type: application/json
Schema
{
  "results": [
    {
      "language": "en",
      "title": "Diplomatic duels: What now for the Donald Trump-Kim Jong Un summit?",
      "url": "https://economictimes.indiatimes.com/news/defence/diplomatic-duels-what-now-for-the-dinald-trump-kim-jong-un-summit/articleshow/64351498.cms",
      "score": 13.17083740234375,
      "image_url": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSv0-NPFkf98_pIa9-1aUMeCksBDD7GdPrN4RdWziokhu1kb1yk7EmtyRlozeQgOMT6bqRIq7yr_0U",
      "publication_date": "2018-05-28 06:59:00"
      },
    {
      "language": "en",
      "title": "US Team In North Korea For Summit Talks, Says Donald Trump",
      "url": "https://www.ndtv.com/world-news/us-team-in-north-korea-for-summit-talks-says-donald-trump-1858532",
      "score": 12.415493965148926,
      "image_url": "https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTDk0Idr_6tGHP5Ur7U1ZXZxICebGR0K-2kTcWtgJ589b_hLb1BvBIV7dJCbw_wLbgp8oXbyXUPUhU",
      "publication_date": "2018-05-28 05:17:44"
      },
    {
      "language": "en",
      "title": "Donald Trump Jr is in high political demand – for now",
      "url": "https://www.theguardian.com/us-news/2018/may/28/donald-trump-jr-high-demand-conservative-groups-wary",
      "score": 12.384574890136719,
      "image_url": "https://i.guim.co.uk/img/media/5394c2707b62a7a882047907cf3beab4a5e3d2a5/0_126_4200_2519/master/4200.jpg?w=140&q=55&auto=format&usm=12&fit=max&s=1697b507f5ae8b8f7eda9e3c91929d69",
      "publication_date": "2018-05-28 05:00:45"
      }
    ]
}

Fake News Detection

The fake news detection method analyzes news articles to identify whether they are likely to be real news or not. With the help of custom AI classifiers, it can detect different types of fake information, such as propaganda, conspiracy, pseudoscience, bias, irony.

News analysis algorithm uses a wide range of components in order to successfully solve the fake news detection problem: custom machine learning models trained on fake and biased articles, proprietary multi-language summarization technology to extract only important information and remove information noise, historical news data search to check the story relevancy and misleading facts, database of trusted and biased websites created by our experts.

Detect fake news from weblinks
GET /checkfake

Analyze news content and detect fake news from a given url.

Example URI

GET  https://www.summarizebot.com/api/checkfake
URI Parameters
Hide Show
apiKey
string (required) 

API Key

url
string (required) 

Article or web page url

Response  200
Hide Show
Headers
Content-Type: application/json
Schema
{
  "predictions": [
    {
      "confidence": 0.36,
      "type": "real"
    },
    {
      "confidence": 0.64,
      "type": "fake",
      "categories": [
        {
          "confidence": 0.2,
          "type": "bias"
        },
        {
          "confidence": 0.1,
          "type": "conspiracy"
        },
        {
          "confidence": 0,
          "type": "propaganda"
        },
        {
          "confidence": 0.6,
          "type": "pseudoscience"
        },
        {
          "confidence": 0.1,
          "type": "irony"
        }
      ]
    }
  ]
}

Linguistic Processor

Linguistic processor is the custom natural language processing solution for deep linguistic analysis of unstructured data that supports 39+ languages covering all European, major Asian and Arabic languages. It automatically detects tokens and sentences, identifies parts of speech tags (PoS), lemmas, noun phrases, and extracts semantic relations for each sentence.

Extract linguistic analysis results from weblinks
GET /syntax

Extract linguistic analysis results from a given url.

Example URI

GET  https://www.summarizebot.com/api/syntax
URI Parameters
Hide Show
apiKey
string (required) 

API Key

url
string (required) 

Article or web page url

language
string (optional) 

Document language in the ISO 639-1 format. If the value for language is undefined the document language will be detected automatically

Response  200
Hide Show
Headers
Content-Type: application/json
Schema
[
    {
        "tokens": [
            {
                "lemma": "culture",
                "tag": "NNP",
                "word": "Culture",
                "end offset": 7,
                "start offset": 0
            },
            {
                "lemma": "minister",
                "tag": "NNP",
                "word": "Minister",
                "end offset": 16,
                "start offset": 8
            },
            {
                "lemma": "alberto",
                "tag": "NNP",
                "word": "Alberto",
                "end offset": 24,
                "start offset": 17
            },
            {
                "lemma": "bonisoli",
                "tag": "NNP",
                "word": "Bonisoli",
                "end offset": 33,
                "start offset": 25
            },
            {
                "lemma": "describe",
                "tag": "VBD",
                "word": "described",
                "end offset": 43,
                "start offset": 34
            },
            {
                "lemma": "the",
                "tag": "DT",
                "word": "the",
                "end offset": 47,
                "start offset": 44
            },
            {
                "lemma": "finding",
                "tag": "NN",
                "word": "finding",
                "end offset": 55,
                "start offset": 48
            },
            {
                "lemma": "as",
                "tag": "IN",
                "word": "as",
                "end offset": 58,
                "start offset": 56
            },
            {
                "lemma": "a",
                "tag": "DT",
                "word": "a",
                "end offset": 60,
                "start offset": 59
            },
            {
                "lemma": "discovery",
                "tag": "NN",
                "word": "discovery",
                "end offset": 70,
                "start offset": 61
            },
            {
                "lemma": "that",
                "tag": "WDT",
                "word": "that",
                "end offset": 75,
                "start offset": 71
            },
            {
                "lemma": "fill",
                "tag": "VBZ",
                "word": "fills",
                "end offset": 81,
                "start offset": 76
            },
            {
                "lemma": "him",
                "tag": "PRP",
                "word": "him",
                "end offset": 85,
                "start offset": 82
            },
            {
                "lemma": "with",
                "tag": "IN",
                "word": "with",
                "end offset": 90,
                "start offset": 86
            },
            {
                "lemma": "pride",
                "tag": "NN",
                "word": "pride",
                "end offset": 96,
                "start offset": 91
            },
            {
                "lemma": ".",
                "tag": ".",
                "word": ".",
                "end offset": 97,
                "start offset": 96
            }
        ],
        "chunks": [
            {
                "chunk": "Culture Minister Alberto Bonisoli",
                "start index": 0,
                "type": "NP",
                "chunk head": "Bonisoli",
                "end index": 4
            },
            {
                "chunk": "described",
                "start index": 4,
                "type": "VP",
                "chunk head": "described",
                "end index": 5
            },
            {
                "chunk": "the finding",
                "start index": 5,
                "type": "NP",
                "chunk head": "finding",
                "end index": 7
            },
            {
                "chunk": "a discovery",
                "start index": 8,
                "type": "NP",
                "chunk head": "discovery",
                "end index": 10
            },
            {
                "chunk": "fills",
                "start index": 11,
                "type": "VP",
                "chunk head": "fills",
                "end index": 12
            },
            {
                "chunk": "him",
                "start index": 12,
                "type": "NP",
                "chunk head": "him",
                "end index": 13
            },
            {
                "chunk": "pride",
                "start index": 14,
                "type": "NP",
                "chunk head": "pride",
                "end index": 15
            }
        ],
        "relations": [
            {
                "verb": {
                    "phrase": "described",
                    "start index": 4,
                    "end index": 5
                },
                "object": {
                    "phrase": "the finding",
                    "start index": 5,
                    "end index": 7
                },
                "subject": {
                    "phrase": "Culture Minister Alberto Bonisoli",
                    "start index": 0,
                    "end index": 4
                }
            }
        ],
        "sentence": "Culture Minister Alberto Bonisoli described the finding as a discovery that fills him with pride."
    }
]

Extract linguistic analysis results from binary data
POST /syntax

Extract linguistic analysis results from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.

Example URI

POST https://www.summarizebot.com/api/syntax
URI Parameters
Hide Show
apiKey
string (required) 

API Key

filename
string (required) 

Name of the file, e.g. filename=1.pdf

language
string (optional) 

Document language in the ISO 639-1 format. If the value for language is undefined the document language will be detected automatically

Request  200
Hide Show
Headers
Content-Type: application/json
Schema
[
    {
        "tokens": [
            {
                "lemma": "culture",
                "tag": "NNP",
                "word": "Culture",
                "end offset": 7,
                "start offset": 0
            },
            {
                "lemma": "minister",
                "tag": "NNP",
                "word": "Minister",
                "end offset": 16,
                "start offset": 8
            },
            {
                "lemma": "alberto",
                "tag": "NNP",
                "word": "Alberto",
                "end offset": 24,
                "start offset": 17
            },
            {
                "lemma": "bonisoli",
                "tag": "NNP",
                "word": "Bonisoli",
                "end offset": 33,
                "start offset": 25
            },
            {
                "lemma": "describe",
                "tag": "VBD",
                "word": "described",
                "end offset": 43,
                "start offset": 34
            },
            {
                "lemma": "the",
                "tag": "DT",
                "word": "the",
                "end offset": 47,
                "start offset": 44
            },
            {
                "lemma": "finding",
                "tag": "NN",
                "word": "finding",
                "end offset": 55,
                "start offset": 48
            },
            {
                "lemma": "as",
                "tag": "IN",
                "word": "as",
                "end offset": 58,
                "start offset": 56
            },
            {
                "lemma": "a",
                "tag": "DT",
                "word": "a",
                "end offset": 60,
                "start offset": 59
            },
            {
                "lemma": "discovery",
                "tag": "NN",
                "word": "discovery",
                "end offset": 70,
                "start offset": 61
            },
            {
                "lemma": "that",
                "tag": "WDT",
                "word": "that",
                "end offset": 75,
                "start offset": 71
            },
            {
                "lemma": "fill",
                "tag": "VBZ",
                "word": "fills",
                "end offset": 81,
                "start offset": 76
            },
            {
                "lemma": "him",
                "tag": "PRP",
                "word": "him",
                "end offset": 85,
                "start offset": 82
            },
            {
                "lemma": "with",
                "tag": "IN",
                "word": "with",
                "end offset": 90,
                "start offset": 86
            },
            {
                "lemma": "pride",
                "tag": "NN",
                "word": "pride",
                "end offset": 96,
                "start offset": 91
            },
            {
                "lemma": ".",
                "tag": ".",
                "word": ".",
                "end offset": 97,
                "start offset": 96
            }
        ],
        "chunks": [
            {
                "chunk": "Culture Minister Alberto Bonisoli",
                "start index": 0,
                "type": "NP",
                "chunk head": "Bonisoli",
                "end index": 4
            },
            {
                "chunk": "described",
                "start index": 4,
                "type": "VP",
                "chunk head": "described",
                "end index": 5
            },
            {
                "chunk": "the finding",
                "start index": 5,
                "type": "NP",
                "chunk head": "finding",
                "end index": 7
            },
            {
                "chunk": "a discovery",
                "start index": 8,
                "type": "NP",
                "chunk head": "discovery",
                "end index": 10
            },
            {
                "chunk": "fills",
                "start index": 11,
                "type": "VP",
                "chunk head": "fills",
                "end index": 12
            },
            {
                "chunk": "him",
                "start index": 12,
                "type": "NP",
                "chunk head": "him",
                "end index": 13
            },
            {
                "chunk": "pride",
                "start index": 14,
                "type": "NP",
                "chunk head": "pride",
                "end index": 15
            }
        ],
        "relations": [
            {
                "verb": {
                    "phrase": "described",
                    "start index": 4,
                    "end index": 5
                },
                "object": {
                    "phrase": "the finding",
                    "start index": 5,
                    "end index": 7
                },
                "subject": {
                    "phrase": "Culture Minister Alberto Bonisoli",
                    "start index": 0,
                    "end index": 4
                }
            }
        ],
        "sentence": "Culture Minister Alberto Bonisoli described the finding as a discovery that fills him with pride."
    }
]

Keywords Extraction

The keywords extraction method automatically extracts the most important keywords from weblinks, documents, audio files and images.

Caution

The language of text documents will be detect automatically. For audio files and images it should be specified for each request. If the value for language is undefined, then the default language for audio and image processing will be set to English.

Extract keywords from weblinks
GET /keywords

Extract keywords from a given url.

Example URI

GET  https://www.summarizebot.com/api/keywords
URI Parameters
Hide Show
apiKey
string (required) 

API Key

url
string (required) 

Article or web page url

keywords
integer (optional, default = 10) 

Maximum count of keywords to return

language
string (optional for text files, required for audio files and images) 

A language of text files will be detected automatically. For audio files it should be specified from the list of supported languages, e.g. language=German.

Response  200
Hide Show
Headers
Content-Type: application/json
Schema
{  
   "keywords" : [  
      {  
         "keyword" : "artificial intelligence",
         "weight" : 0.87,
         "ids" : [
            1,
            6
         ]
      },
      {  
         "keyword" : "machines",
         "weight" : 0.71,
         "ids" : [  
            0,
            4
         ]
      }
   ]
}

Extract keywords from binary data
POST /keywords

Extract keywords from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.

Example URI

POST https://www.summarizebot.com/api/keywords
URI Parameters
Hide Show
apiKey
string (required) 

API Key

filename
string (required) 

Name of the file, e.g. filename=1.pdf

keywords
integer (optional, default = 10) 

Maximum count of keywords to return

language
string (optional for text files, required for audio files and images) 

A language of text files will be detected automatically. For audio files it should be specified from the list of supported languages, e.g. language=German.

Request  200
Hide Show
Headers
Content-Type: application/json
Schema
{  
   "keywords" : [  
      {  
         "keyword" : "artificial intelligence",
         "weight" : 0.87,
         "ids" : [
            1,
            6
         ]
      },
      {  
         "keyword" : "machines",
         "weight" : 0.71,
         "ids" : [  
            0,
            4
         ]
      }
   ]
}

Article Extraction

The article extraction method is used to extract clean article text from a file that you provide to the API. For hypertext documents it also identifies different metadata such as title, main article image, publish date, author, meta description, etc.

Caution

The article extraction method can handle only text files.

Extract plain article text and metadata from weblinks
GET /extract

Extract article text and metadata from a given url.

Example URI

GET  https://www.summarizebot.com/api/extract
URI Parameters
Hide Show
apiKey
string (required) 

API Key

url
string (required) 

Article or web page url

Response  200
Hide Show
Headers
Content-Type: application/json
Schema
{
    "text": "(CNN) Night has fallen on a toe-numbing English winter's day. 
      In a manor house, where spirits of aristocrats are rumored to roam 
      ancient hallways, are some of England's finest young athletes.\n\nIn a 
      dimly lit, oak-paneled room at Bisham Abbey, 30 miles west of London, 
      these 18 to twentysomethings have gathered for another chapter in 
      their learning.\n\n A grand-looking Victorian lady, framed in gold, 
      peers down on the assembled players and coaches. On these same dark walls 
      hang the works of Raphael.",
    "article title": "How to build a rugby player -- Inside England's
      Under-20s camp",
    "meta information": {
        "meta description": "England's Under-20s give CNN Sport exclusive access
        as they prepare for the Under-20 Six Nations, a championship they have 
        won six times in 10 years.",
        "publish date": "2018-02-03T10:28:00Z",
        "image": "https://cdn.cnn.com/cnnnext/dam/assets/
        180129105453-owen-farrell-super-tease.jpg",
        "authors": [
            "Aimee Lewis"
        ],
        "meta keywords": "sport, Six Nations 2018, training camp"
    }
}

Extract plain article text and metadata from binary data
POST /extract

Extract article text and metadata from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.

Example URI

POST https://www.summarizebot.com/api/extract
URI Parameters
Hide Show
apiKey
string (required) 

API Key

filename
string (required) 

Name of the file, e.g. filename=1.html

Request  200
Hide Show
Headers
Content-Type: application/json
Schema
{
    "text": "(CNN) Night has fallen on a toe-numbing English winter's day. 
      In a manor house, where spirits of aristocrats are rumored to roam 
      ancient hallways, are some of England's finest young athletes.\n\nIn a 
      dimly lit, oak-paneled room at Bisham Abbey, 30 miles west of London, 
      these 18 to twentysomethings have gathered for another chapter in 
      their learning.\n\n A grand-looking Victorian lady, framed in gold, 
      peers down on the assembled players and coaches. On these same dark walls 
      hang the works of Raphael.",
    "article title": "How to build a rugby player -- Inside England's
      Under-20s camp",
    "meta information": {
        "meta description": "England's Under-20s give CNN Sport exclusive access
        as they prepare for the Under-20 Six Nations, a championship they have 
        won six times in 10 years.",
        "publish date": "2018-02-03T10:28:00Z",
        "image": "https://cdn.cnn.com/cnnnext/dam/assets/
        180129105453-owen-farrell-super-tease.jpg",
        "authors": [
            "Aimee Lewis"
        ],
        "meta keywords": "sport, Six Nations 2018, training camp"
    }
}

Language Detection

The language detection method analyzes a document that you provide and recognizes the language of the text. The method returns the language code conform to ISO 639-1 identifiers.

Detect language of a text from weblinks
GET /language

Detect text language from a given url.

Example URI

GET  https://www.summarizebot.com/api/language
URI Parameters
Hide Show
apiKey
string (required) 

API Key

url
string (required) 

Article or web page url

Response  200
Hide Show
Headers
Content-Type: application/json
Schema
{
    "language": "en"
}

Detect language of a text from binary data
POST /language

Detect text language from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.

Example URI

POST https://www.summarizebot.com/api/language
URI Parameters
Hide Show
apiKey
string (required) 

API Key

filename
string (required) 

Name of the file, e.g. filename=1.html

Request  200
Hide Show
Headers
Content-Type: application/json
Schema
{
    "language": "en"
}

Face Detection

The face detection method analyzes an image file to find faces. The method returns a list of items, each of which contains the coordinates of a face that was detected in the file.

Caution

The face detection method processes only image files (.jpeg, .png, etc.).

Detect faces from image weblinks
GET /faces

Detect faces from a given image url.

Example URI

GET  https://www.summarizebot.com/api/faces
URI Parameters
Hide Show
apiKey
string (required) 

API Key

url
string (required) 

Image url

Response  200
Hide Show
Headers
Content-Type: application/json
Schema
{
    "faces": [
        {
            "y": "371",
            "x": "370",
            "height": "137",
            "width": "137"
        },
        {
            "y": "190",
            "x": "474",
            "height": "149",
            "width": "149"
        },
        {
            "y": "210",
            "x": "598",
            "height": "155",
            "width": "155"
        },
        {
            "y": "399",
            "x": "706",
            "height": "146",
            "width": "146"
        }
    ]
}

Detect faces from image binary data
POST /faces

Detect faces from image binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.

Example URI

POST https://www.summarizebot.com/api/faces
URI Parameters
Hide Show
apiKey
string (required) 

API Key

Request  200
Hide Show
Headers
Content-Type: application/json
Schema
{
    "faces": [
        {
            "y": "371",
            "x": "370",
            "height": "137",
            "width": "137"
        },
        {
            "y": "190",
            "x": "474",
            "height": "149",
            "width": "149"
        },
        {
            "y": "210",
            "x": "598",
            "height": "155",
            "width": "155"
        },
        {
            "y": "399",
            "x": "706",
            "height": "146",
            "width": "146"
        }
    ]
}

Image Recognition

The image recognition method classifies the contents of an entire image into thousands of categories (e.g., "basketball", "lion", "shark"). It returns a list of tags (labels) for an image along with a confidence score which indicates how confident the system is about the assignment.

Recognize an image content from weblinks
GET /images

Image recognition from a given url.

Example URI

GET  https://www.summarizebot.com/api/images
URI Parameters
Hide Show
apiKey
string (required) 

API Key

url
string (required) 

Image url

tags
string (optional, default = 5) 

Maximum count of image tags to return

Response  200
Hide Show
Headers
Content-Type: application/json
Schema
{
    "tags": [
        {
            "confidence": 0.9, 
            "name": "great white shark, white shark"
        }, 
        {
            "confidence": 0.05,
            "name": "tiger shark"
        }, 
        {
            "confidence": 0.03, 
            "name": "killer whale"
        }
    ]
}

Recognize an image content from binary data
POST /images

Image recognition from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.

Example URI

POST https://www.summarizebot.com/api/images
URI Parameters
Hide Show
apiKey
string (required) 

API Key

filename
string (required) 

Name of the file, e.g. filename=1.jpg

tags
string (optional, default = 5) 

Maximum count of image tags to return

Request  200
Hide Show
Headers
Content-Type: application/json
Schema
{
    "tags": [
        {
            "confidence": 0.9, 
            "name": "great white shark, white shark"
        }, 
        {
            "confidence": 0.05,
            "name": "tiger shark"
        }, 
        {
            "confidence": 0.03, 
            "name": "killer whale"
        }
    ]
}

Stemming

The stemming method automatically reduces inflected words to their base or root form and removes stop words from text documents. It supports English, French, German, Spanish, Italian, Russian, Swedish, Danish, Finnish, Dutch, Hungarian, Norwegian, Portuguese and Romanian languages.

Stem and remove stop words from text data
POST /stem

Stem and remove stop words from text data. POST body should include the text in the JSON format, e.g. { "text" : "document text"}. The HTTP header should be specified as 'application/json'. The language of text documents will be detected automatically.

Example URI

POST https://www.summarizebot.com/api/stem
URI Parameters
Hide Show
apiKey
string (required) 

API Key

Request  200
Hide Show
Headers
Content-Type: application/json
Schema
{
    "stemmed": "lawyer post video sign languag danger ponzi 
               scheme post went viral hundr deaf peopl got touch 
               legal troubl fraud domest violenc uncov huge communiti 
               need help tang shuai simpli tri improv legal knowledg among deaf
               communiti post video china wechat messag app februari instant
               hit mr tang flood mani friend request ask wechat boost friend 
               limit 5,000 10,000 strike chord answer goe way beyond legal
               difficulti complex world sign languag china",
    "language": "en"
}

Comments Extraction

The comments extraction method automatically structures and extracts reviews and comments from web pages.

Extract comments from weblinks
GET /comments

Extract comments from a given url.

Example URI

GET  https://www.summarizebot.com/api/comments
URI Parameters
Hide Show
apiKey
string (required) 

API Key

url
string (required) 

Article or web page url

Response  200
Hide Show
Headers
Content-Type: application/json
Schema
{
    "comments": [
        "Well, the Hotel is very central, perfect 
        for shopping, sightseeing or nightlife. 
        Friendly welcome on arrival, a complimentary 
        birthday drink brought to us in the comfy lounge area.",
        "Would definately stay at this hotel again 
        and recommended this to others.",
        "Cleanliness of bedrooms is always very high. 
        Complimentary breakfast is a welcome feature.",
        "Nice room on the second floor at the far end of the hall. 
        Very quiet room. Comfortable bed, nice shower 
        with hot, hot water.",
        "the amazing breakfast! I cannot find a fault 
        with 5his new hotel it competes and is 
        better than most high end expensive hotels in the city!"
    ]
}

Extract comments from binary data
POST /comments

Extract comments from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.

Example URI

POST https://www.summarizebot.com/api/comments
URI Parameters
Hide Show
apiKey
string (required) 

API Key

filename
string (required) 

Name of the file, e.g. filename=1.html

Request  200
Hide Show
Headers
Content-Type: application/json
Schema
{
    "comments": [
        "Well, the Hotel is very central, perfect 
        for shopping, sightseeing or nightlife. 
        Friendly welcome on arrival, a complimentary 
        birthday drink brought to us in the comfy lounge area.",
        "Would definately stay at this hotel again 
        and recommended this to others.",
        "Cleanliness of bedrooms is always very high. 
        Complimentary breakfast is a welcome feature.",
        "Nice room on the second floor at the far end of the hall. 
        Very quiet room. Comfortable bed, nice shower 
        with hot, hot water.",
        "the amazing breakfast! I cannot find a fault 
        with 5his new hotel it competes and is 
        better than most high end expensive hotels in the city!"
    ]
}

Video Identification

The video identification method automatically extracts detailed video information from hypertext pages: direct video url, video provider, video width and height.

Caution

The video identification method processes only hypertext files (.html, .xml, etc.).

Extract information about videos from weblinks
GET /video

Extract video information from a given url.

Example URI

GET  https://www.summarizebot.com/api/video
URI Parameters
Hide Show
apiKey
string (required) 

API Key

url
string (required) 

Web page url

Response  200
Hide Show
Headers
Content-Type: application/json
Schema
{
    "video": [
        {
            "source": "https://www.youtube.com/embed/YqB50JG2aIE",
            "height": "70%",
            "width": "100%",
            "provider": "youtube"
        },
        {
            "source": "https://www.youtube.com/embed/XYPE7rZkYRg",
            "height": null,
            "width": "100%",
            "provider": "youtube"
        }
    ]
}

Extract information about videos from binary data
POST /video

Extract video information from binary data. POST body should include file content in binary form. The HTTP header should be specified as 'application/octet-stream'.

Example URI

POST https://www.summarizebot.com/api/video
URI Parameters
Hide Show
apiKey
string (required) 

API Key

filename
string (required) 

Name of the file, e.g. filename=1.html

Request  200
Hide Show
Headers
Content-Type: application/json
Schema
{
    "video": [
        {
            "source": "https://www.youtube.com/embed/YqB50JG2aIE",
            "height": "70%",
            "width": "100%",
            "provider": "youtube"
        },
        {
            "source": "https://www.youtube.com/embed/XYPE7rZkYRg",
            "height": null,
            "width": "100%",
            "provider": "youtube"
        }
    ]
}