API Documentation
The scrapestack API was built to offer a simple REST API interface for scraping web pages at scale without having to programatically deal with geolocations, IP blocks or CAPTCHAs. The API supports a series of features essential to web scraping, such as JavaScript rendering, custom HTTP headers, various geo-targets, POST/PUT requests and an option to use premium residential proxies instead of datacenter proxies.
In this documentation you will find detailed usage guides and code examples in different programming languages that will help you get up and running in the shortest time possible. If the articles below leave any questions unanswered, please feel free to contact our technical support team.
Getting Started
API Access Key & Authentication
After creating a scrapestack account, the account dashboard will reveal the unique API access key you can use to authenticate with the API. To do so, simply attach the access_key
parameter to the API's base URL and set it to your API access key.
Sign Up to Run API Requesthttps://api.scrapestack.com/scrape ? access_key = YOUR_ACCESS_KEY
Secure your key: To prevent unauthorized access to your scrapestack account, please make sure to store your API access key in a secure location and never include it in any public scripts or files.
256-bit HTTPS Encryption Available on: Basic Plan and higher
Customers subscribed to the Basic Plan or higher may connect to the scrapestack API using industry-standard 256-bit HTTPS (SSL) encryption by appending an s
to the HTTP protocol. Please find an illustration below.
Example API Request:
https://api.scrapestack.com/scrape
API Error Codes
If your request fails, the scrapestack API will return an error in JSON format. Find below an example error that occurs when the API failed scraping the requested URL.
Example Error:
{ "success": false, "error": { "code": 105, "type": "scrape_request_failed" } }
Common API Errors:
Code | Type | Info |
---|---|---|
404 |
404_not_found |
User requested a resource which does not exist. |
101 |
missing_access_key |
User did not supply an access key. |
101 |
invalid_access_key |
User supplied an invalid access key. |
102 |
inactive_user |
User account is inactive or blocked. |
103 |
invalid_api_function |
User requested a non-existent API function. |
104 |
usage_limit_reached |
User has reached his subscription's monthly request allowance. |
105 |
function_access_restricted |
The user's current subscription does not support this API function. |
105 |
https_access_restricted |
The user's current subscription plan does not support HTTPS. |
210 |
missing_url |
User has not specified a valid URL to scrape. |
211 |
invalid_url |
User has specified an invalid value in the URL parameter. |
212 |
invalid_proxy_location |
User has specified an invalid or unsupported proxy location. |
213 |
scrape_request_failed |
The current scraping request failed due to a technical issue. If this error occurs, please report this to technical customer support. |
API Features
Basic Request Available on: All plans
To scrape a web page using the scrapestack API, simply use the API's base endpoint and append the URL you would like to scrape as well as your API access key as GET parameters. There is also a series of optional parameters you can choose from. Below you will find an example request used to scrape the URL https://apple.com
.
Example API Request:
Sign Up to Run API Requesthttps://api.scrapestack.com/scrape ? access_key = YOUR_ACCESS_KEY & url = https://apple.com
Request Parameters:
Object | Description |
---|---|
access_key |
[Required] Specify your unique API access key to authenticate with the API. Your API access key can be found in your account dashboard. |
url |
[Required] Specify the URL of the web page you would like to scrape. |
render_js |
[optional] Set to 0 (off, default) or 1 (on) depending on whether or not to render JavaScript on the target web page. JavaScript rendering is done using a Google Chrome headless browser. |
keep_headers |
[optional] Set 0 (off, default) or 1 (on) depending on whether or not to send currently active HTTP headers to the target URL with your API request and have the API return these headers along with your API response. |
proxy_location |
[optional] Specify the 2-letter code of the country you would like to us as a proxy geolocation for your scraping API request. Supported countries differ by proxy type, please refer to the Proxy Locations section for details. |
premium_proxy |
[optional] Set 0 (off, default) or 1 (on) depending on whether or not to enable premium residential proxies for your scraping request. Please note that a single premium proxy API request is counted as 25 API requests. |
Example API Response:
If your scraping request was successful, the API will respond with the raw HTML data of your target web page URL. If you have enabled HTTP headers, your API response will also contain the HTTP headers sent along with your original API request.
<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US"> <head> [...] // 44 lines skipped </head> <body> [...] // 394 lines skipped </body> </html>
API response shortened: Please note that the API response above has been shortened for readability purposes. To see the entire API resonse, please click the "Run API Request" button in the Example API Request section or sign up for an access key.
JavaScript Rendering Available on: Basic Plan and higher
Some web pages render essential page elements using JavaScript, which means that some content is not present (and therefore not scrapable) with the initial page load. With the render_js
parameter enabled, the scrapestack API is capable of accessing the target web using a headless browser (Google Chrome) and allow JavaScript page elements to render before delivering the final scraping result.
To enable JavaScript rendering, simply append the render_js
HTTP GET parameter to your API request URL and set it to 1
. By default, this parameter is set to 0
(off).
Example API Request:
Sign Up to Run API Requesthttps://api.scrapestack.com/scrape ? access_key = YOUR_ACCESS_KEY & url = https://apple.com & render_js = 1
Example API Response:
To see an API response, please click the "Run API Request" button above or sign up for an API access key.
HTTP Headers Available on: All plans
The scrapestack API will accept HTTP headers and pass them through to the target web page and the final API response if the keep_headers
HTTP GET parameter is set to 1
. By default, this parameter is set to 0
(off).
Below you will find an example API request (Bash using the "curl" command) that contains an HTTP X-Header. If this request is executed, the specified header will be sent to the target web page and returned in the final API response.
Example Bash (curl) Request with HTTP header:
curl --header "X-AnyHeader: Test" \
"https://api.scrapestack.com/scrape?access_key=YOUR_ACCESS_KEY&url=https://apple.com"
Unsupported HTTP Headers: Although most HTTP headers are supported by the API, there are some that cannot be processed. Please find a list of unsupported HTTP headers below:
content-encoding
content-length
Proxy Locations Available on: Basic Plan and higher
The scrapestack API is making use of a pool of 35+ million IP addresses worldwide. By default, the API will auto-rotate IP addresses in a way that the same IP address is never used twice in a row.
Across both standard and premium proxies, the scrapestack API supports more than 100 global geolocations your scraping request can be sent from. Using the API's proxy_location
HTTP GET parameter you can choose a specific country by its 2-letter country code for your scraping request. Please find an example API request below, which specifies au
(for Australia) as proxy location.
Example API Request:
Sign Up to Run API Requesthttps://api.scrapestack.com/scrape ? access_key = YOUR_ACCESS_KEY & url = https://apple.com & proxy_location = au
Example API Response:
To see an API response, please click the "Run API Request" button above or sign up for an API access key.
Standard Proxies - Supported Countries:
For standard (datacenter) proxies, the API currently supports a total of 77 global geolocations. You can download a full list of supported countries and 2-letter country codes using the following link: locations-standard-proxy.csv
Premium Proxies - Supported Countries:
For premium (residential) proxies, the API currently supports a total of 38 global geolocations. You can download a full list of supported countries and 2-letter country codes using the following link: locations-premium-proxy.csv
HTTP POST/PUT Requests Available on: All plans
The scrapestack API also offers a way of scraping forms or API endpoints directly by supporting API requests via HTTP POST/PUT. Below you will find relevant examples that can be used both via POST
and PUT
.
Example Request - HTTP POST:
curl -d 'foo=bar' \ -X POST \ "https://api.scrapestack.com/scrape?access_key=YOUR_ACCESS_KEY&url=https://apple.com"
Example Request - HTTP POST with Form Data:
curl -H 'Content-Type: application/x-www-form-urlencoded' \ -F 'foo=bar' \ -X POST \ "https://api.scrapestack.com/scrape?access_key=YOUR_ACCESS_KEY&url=https://apple.com"
Code Examples
Below you will find sample scraping requests in the following programming languages: PHP, Python, Nodejs, jQuery, Go and Ruby.
Code Example - PHP
<?php $queryString = http_build_query([ 'access_key' => 'YOUR_ACCESS_KEY', 'url' => 'http://scrapestack.com', ]); $ch = curl_init(sprintf('%s?%s', 'http://api.scrapestack.com/scrape', $queryString)); curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); $website_content = curl_exec($ch); curl_close($ch); echo $website_content;
Code Example - Python
import requests params = { 'access_key': 'YOUR_ACCESS_KEY', 'url': 'http://scrapestack.com' } api_result = requests.get('http://api.scrapestack.com/scrape', params) website_content = api_result.content print(website_content)
Code Example - Nodejs
const axios = require('axios'); const params = { access_key: 'YOUR_ACCESS_KEY', url: 'http://scrapestack.com' } axios.get('http://api.scrapestack.com/scrape', {params}) .then(response => { const websiteContent = response.data; console.log(websiteContent); }).catch(error => { console.log(error); });
Code Example - jQuery
$.get('https://api.scrapestack.com/scrape', { access_key: 'YOUR_ACCESS_KEY', url: 'http://scrapestack.com' }, function(websiteContent) { console.log(websiteContent); } );
Code Example - Go
package main import ( "fmt" "io/ioutil" "net/http" ) func main() { httpClient := http.Client{} req, err := http.NewRequest("GET", "http://api.scrapestack.com/scrape", nil) if err != nil { panic(err) } q := req.URL.Query() q.Add("access_key", "YOUR_ACCESS_KEY") q.Add("url", "http://scrapestack.com") req.URL.RawQuery = q.Encode() res, err := httpClient.Do(req) if err != nil { panic(err) } defer res.Body.Close() if res.StatusCode == http.StatusOK { bodyBytes, err := ioutil.ReadAll(res.Body) if err != nil { panic(err) } websiteContent := string(bodyBytes) fmt.Println(websiteContent) } }
Code Example - Ruby
require 'net/http' require 'json' params = { :access_key => "YOUR_ACCESS_KEY", :url => "http://scrapestack.com" } uri = URI('http://api.scrapestack.com/scrape') uri.query = URI.encode_www_form(params) website_content = Net::HTTP.get(uri) print(website_content)