Home Phishing Blog Webmasters About Privacy Site map  
SiteTruth

SiteTruth application program interface - Version 2

alpha test

SiteTruth has an Application Program Interface, or API, intended for use by "AJAX" applications. The API allows programs to request a rating for a URL or set of URLs. We are offering this interface for open use as part of our alpha test.

Change log

  • Version 1: XML output only.
  • Version 2: "format" argument and JSON output added. Backwards compatible.

Requests

A request for site ratings is submitted with the following fields encoded in the URL:

  • url, url, url, ... URLs to be rated These are separate fields, all beginning with "url". Either a domain name or a full URL can be submitted. Only the domain name part of the URL is used, so, for privacy reasons, it is best to send only that. When multiple url fields are provided, the entries are handled in that order. At least 20 url fields can be used in a single query; the actual limit is higher.
  • priority - request priority. When making multiple requests to rate domains returned from some search result, use priority 1 for the first search result, priority 2 for the second, and so forth. This will insure that the top search results are rated first. This field is optional, the minimum value is 0, and the default value is 1.
  • key - user key. This identifies the application using the API. Currently, a key of guest should be used. Invalid keys will be rejected with an HTTP error 403.
  • format - output format. Options are xml or json. The default is xml.

A typical URL is of the form

http://www.sitetruth.com/fcgi/rateapiv2.fcgi?url=ftc.gov&key=guest

 

Replies

The reply to each request is in XM or JSON. XML reply structure is simply a sequence of

<sitetruth:rating url="url"rating="letter" ratinginfo="message" status="200">error text </sitetruth:rating>

In JSON format, the result is an array of JSON objects.

[{domain: "domain", rating: "letter", ratinginfo="message", status: "200", err: "error text"}]


The HTTP reply status is normally 200. 4xx or 5xx values indicate server problems. The error text field contains internal diagnostic information only.

HTTP reply status
status Meaning Notes
200
OK Good results are attached.
403
Forbidden The key value was not accepted.
414
Request too long The request is unreasonably huge.
410
Gone This API version has been discontinued. Please upgrade.
502
Overload Too many requests pending from this IP address
5xx
Server error Other Server problems

With a HTTP reply status other than 200, a human-readable HTML document will be returned instead of XML or JSON.

Field values

SiteTruth rating codes
rating Icon Meaning
A
Green checkmark Site ownership verified.
Q
Yellow question mark Site ownership identified but not verified.
X
Red "do not enter" Site ownership unknown or questionable.
U
Grey circle Not rated.
W
Rotating wait icon Ask again later. (See "Retries" below.)

The rating letter is always in upper case. The icons above may be used in conjunction with SiteTruth ratings.

Additional information
ratinginfo Meaning
""
(No message)
"error"
An internal error occurred in the rating system.
"no_domain"
The domain name is not valid.
"no_website"
No web site was found at the domain.
"blocked"
Access to the web site was refused (by password or "robots.txt" file).
"no_location"
No street address could be associated with the web site.
"negative_info"
Negative information about this site was found.
"non_commercial"
The site appears to be non-commercial.
"unverified"
Reserved for future use.
"bad_url"
The url field has invalid syntax or has an IP address, not a domain.

These enumeration values will appear exactly as shown, to allow for translation to multiple languages in the client. For simple English display, convert underscores to spaces and make the first letter upper-case.

Status codes
status Meaning Notes
200
OK Normal completion
202
Accepted Sent with "W" rating - site not yet rated, queued for rating. See below..
500
Internal server error Internal error, try again later.

See below for how to handle a 202 status. Note that this is the status value in the XML or JSON reply, not the HTTP reply status.

Retries and flow control

If a requested domain is in the SiteTruth database, the rating will be returned immediately. If the domain has not been previously rated, it will be queued for rating, and usually rated within a minute or two. When a site is queued for rating, a rating of "W" and a status of "202" are returned. The request should be retried every 5 seconds, for up to two minutes.

A single XML reply with multiple sitetruth:rating items may contain both 200 and 202 status items. Completed items (status 200) are done, and should not be retried.

Retrying the same request more rapidly than once every 5 seconds may result in blocking of the client's IP address. Clients should cache SiteTruth replies to avoid making the same request repeatedly. A cache expiration time of at least one hour is suggested. We use an expiration time of one week in our demo applications.

SiteTruth applies "fair queuing" to requests. Multiple requests from a single IP address are permitted but will not yield faster responses. If an excessive number of requests, more than 100, are outstanding from a single IP address, further requests will be rejected with an HTTP reply code of 502 (Overload). This error indicates that the querying program is defective, and is making requests without waiting for the successful completion of previous requests. (This means you, "fwvplab.elet.polimi.it".)

Obtaining rating details

The application program interface above provides basic information about the site. More detailed information, in the form of a pop up web page, is available by using URLs of the following form:

http://www.sitetruth.com/fcgi/ratingsummary.fcgi?url=www.ftc.gov

 

This is best displayed as a pop-up page opened in a new window. We suggest opening a browser window with the properties:

'height=600,width=700,toolbar=no,menubar=no,scrollbars=yes,resizable=no,location=no,status=no'

 

This provides a summary page, which includes basic information about the business behind the web site. Information provided when available includes the SiteTruth rating, the business name and address, annual revenue, number of employees, and an aerial photo of the company's location. Buttons which display detailed information from other data sources, such as the U.S. Securities and Exchange Commission, may appear. The page will also contain a link to a larger SiteTruth page with full details, more information than most users will want.

If a SiteTruth rating icon is displayed by a program using the API, it should be made a clickable link of the form above. This allows users to easily obtain the information behind the rating.

This feature is currently available only in English.

The older URL form is still supported:

http://www.sitetruth.com/cgi-bin/ratingdetails.cgi?url=www.ftc.gov

 

Terms of use

This service is provided at no charge on a "best-efforts" basis. Ratings reflect the automated opinion of SiteTruth®. This is an alpha test. We reserve the right to modify or discontinue this service. We retain copyright in SiteTruth ratings. This service may not be sold, resold, or used in a commercial product without our express written permission. Use of this service in free software (as defined by the Free Software Foundation) is encouraged.

SiteTruth. Search, with less evil.

Another service from the publishers of Downside