Gini Accounting API Documentation v2.0
Introduction
Gini provides the information extraction system for analyzing documents such as invoices or contracts. The system is capable of extracting the document sender or the amount to pay as well as many other types of specific information from various invoice formats.
The API supports PDF, GIF, JPEG, PNG, and TIFF documents.
The information extraction starts when the document is sent to the extraction system. There it first gets verified and then classified as being native or scanned.
There is a difference between native and scanned PDF files. Native PDFs are created using Microsoft Word, Excel, Illustrator or other software that generates PDF files from source code. Scanned PDFs are created by scanning devices from the actual paper documents.
The native PDF documents already contain this information in the document source code and are processed accordingly. However, the scanned documents do not have the source code and therefore do not directly provide the information that can be easily read and understood by the system. Therefore, the extraction system has to apply Optical Character Recognition (OCR) and various computer vision techniques to obtain the document contents.
Once the layout and the textual contents become available for the uploaded document, the system starts extracting document semantic information such as the document sender (name, address) and meta information such as the document type (invoice, contract).
It might be so that the system is unable to extract the information correctly. This could most likely happen due to OCR errors caused by poor quality of the scanned document, incomplete textual data or quite specific document design format. In such cases it is still possible to correct the extractions by manually selecting the correct amount to pay on the document and submitting it back to the API. The extraction system will receive the feedback and help us to improve its self-learning algorithms over time.
If you have any questions about the Gini Accounting API and the functionality it provides, please contact us via api@gini.net.
Getting started
Welcome! In order to process your first document with the Gini Accounting API, you will have to perform the following easy steps:
- register your application
- obtain an access token
- upload a document
- check the document status information
- retrieve the extractions
- send feedback
For general information about the Gini Accounting API, see overview.
Register Your Application
Before you can use the Gini Accounting API in your application, you need a valid client ID and a client secret. If you don't have the client ID and the client secret already, please contact your sales representative.
Obtain an Access Token
obtain an access token
curl -v -X POST --data-urlencode 'username=random@example.org'
--data-urlencode 'password=geheim'
-H 'Content-Type: application/x-www-form-urlencoded'
-H 'Accept: application/json'
-u 'client-id:client-secret'
'https://user.gini.net/oauth/token?grant_type=password'
the JSON response will look similar to
{
"access_token":"6c470ffa-abf1-41aa-b866-cd3be0ee84f4",
"token_type":"bearer",
"expires_in":3599
}
6c470ffa-abf1-41aa-b866-cd3be0ee84f4
is the access token which can be used for API requests.
All requests to the Gini Accounting API are made on behalf of the user authorized by the access token. For now, let's assume that you've already created an anonymous user. If not, for the details on how to do so please read Direct Communication from Client Devices to the Gini Accounting API
In order to get an access token for the Gini account, run the example command on the right (don't forget to replace random@example.org with your username and geheim with your password as well as client-id with your client ID and client-secret with your client secret).
Upload a Document
upload a document
curl -v -X POST --data-binary '@/path/to/your/document.pdf'
-H 'Accept: application/vnd.gini.v2+json'
-H 'Content-Type: application/octet-stream'
-H 'Authorization: BEARER b6c470ffa-abf1-41aa-b866-cd3be0ee84f'
'https://accounting-api.gini.net/documents'
Now that you have the access token, you can upload your first document by sending an API request.
The request must contain the correspoinding Gini API version number and a valid Content-Type
header.
For example for our first document, we will use Gini Accounting API version v2
.
The command on the right will send a request against the corresponding version of the API.
the response (in case the document was Accepted)
HTTP/1.1 201 Created
X-Request-Id: 7b5a7f79-ae7c-4040-b6cf-25cde58ad937
Location: https://accounting-api.gini.net/documents/b4bd3e80-7bd1-11e4-95ab-000000000000
Content-Type: application/vnd.gini.v2+json
If the file was accepted by the Gini Accounting API (i.e. its file format is supported), the extraction system automatically starts to process the document and responses with the HTTP status code 201 as well as the document location URL.
Check the Document Status Information
check the document status information
curl -v -H 'Accept: application/vnd.gini.v2+json'
-H 'Authorization: BEARER b6c470ffa-abf1-41aa-b866-cd3be0ee84f'
'https://accounting-api.gini.net/documents/b4bd3e80-7bd1-11e4-95ab-000000000000'
the response body will look similar to
{
"_links": {
"processed": "https:\/\/accounting-api.gini.net\/documents\/b4bd3e80-7bd1-11e4-95ab-000000000000\/processed",
"layout": "https:\/\/accounting-api.gini.net\/documents\/b4bd3e80-7bd1-11e4-95ab-000000000000\/layout",
"extractions": "https:\/\/accounting-api.gini.net\/documents\/b4bd3e80-7bd1-11e4-95ab-000000000000\/extractions",
"document": "https:\/\/accounting-api.gini.net\/documents\/b4bd3e80-7bd1-11e4-95ab-000000000000"
},
"sourceClassification": "NATIVE",
"origin": "UPLOAD",
"progress": "PENDING",
"creationDate": 1417710133864,
"pages": [
{
"images": {
"1280x1810": "https:\/\/accounting-api.gini.net\/documents\/b4bd3e80-7bd1-11e4-95ab-000000000000\/pages\/1\/1280x1810",
"750x900": "https:\/\/accounting-api.gini.net\/documents\/b4bd3e80-7bd1-11e4-95ab-000000000000\/pages\/1\/750x900"
},
"pageNumber": 1
}
],
"pageCount": 1,
"name": "Document",
"id": "b4bd3e80-7bd1lll-11e4-95ab-000000000000"
}
The document processing takes a bit of time and in order to get the extractions, you need to check the status of the processed document periodically. The status can have the value PENDING, which means that the document is being analyzed, and COMPLETED, which means that the document analysis is complete and the extractions are ready for retrieval. Check the current document status by sending a GET request to the URL that you received when the document was uploaded. Once the status changes to COMPLETED, the extractions are ready and you can retrieve them.
Retrieve the Extractions
retrieve the extractions
curl -v -H 'Accept: application/vnd.gini.v2+json'
-H 'Authorization: BEARER b6c470ffa-abf1-41aa-b866-cd3be0ee84f'
'https://accounting-api.gini.net/documents/b4bd3e80-7bd1-11e4-95ab-000000000000/extractions'
The document extractions represent various document contents that the extraction system was able to understand and retrieve.
In order to get all the extractions, send a request to the right (notice the API version v2
).
example response
{
"extractions": {
"docType": {
"value": "Invoice",
"entity": "doctype",
"confidence": 0.923
},
"amountToPay": {
"candidates": "amounts",
"box": {
"page": 1,
"height": 9.0,
"width": 30.870000000000005,
"left": 524.13,
"top": 357.89
},
"value": "12.00:EUR",
"entity": "amount"
},
"customerId": {
"candidates": "customerIds",
"box": {
"page": 1,
"height": 7.0,
"width": 31.139999999999986,
"left": 470.0,
"top": 152.89
},
"confidence": 0.821,
"value": "20980000",
"entity": "customerid"
},
"invoiceId": {
"candidates": "invoiceIds",
"box": {
"page": 1,
"height": 7.0,
"width": 38.920000000000016,
"left": 470.0,
"top": 143.89
},
"confidence": 0.971,
"value": "3113805926",
"entity": "invoiceid"
},
"senderName": {
"candidates": "senderNames",
"box": {
"page": 1,
"height": 7.0,
"width": 52.56000000000001,
"left": 41.87,
"top": 88.84
},
"value": "Deutsche Post AG",
"entity": "companyname"
}
},
"compoundExtractions": {
"lineItems": [
{
"sumNet": {
"entity": "amount",
"value": "12.00:EUR",
"box": {
"top": 355.83,
"left": 525.17,
"width": 38.92000000000007,
"height": 10.0,
"page": 1
}
},
"taxRate": {
"entity": "text",
"value": "19 %",
"box": {
"top": 355.83,
"left": 388.18,
"width": 20.00999999999999,
"height": 10.0,
"page": 1
}
}
},
{
"artNumber": {
"entity": "text",
"value": "10101",
"box": {
"top": 388.43,
"left": 82.05,
"width": 20.0,
"height": 10.0,
"page": 1
}
}
}
]
},
"candidates": {
"amounts": [
{
"box": {
"page": 1,
"height": 9.0,
"width": 30.870000000000005,
"left": 524.13,
"top": 357.89
},
"value": "12.00:EUR",
"entity": "amount"
},
{
"box": {
"page": 1,
"height": 12.0,
"width": 40.89999999999998,
"left": 138.02,
"top": 413.09
},
"value": "12.00:EUR",
"entity": "amount"
}
],
"senderNames": [
{
"box": {
"page": 1,
"height": 7.0,
"width": 52.56000000000001,
"left": 41.87,
"top": 88.84
},
"value": "Deutsche Post AG",
"entity": "companyname"
}
],
"customerIds": [
{
"box": {
"page": 1,
"height": 7.0,
"width": 31.139999999999986,
"left": 470.0,
"top": 152.89
},
"value": "20980000",
"entity": "customerid"
}
],
"invoiceIds": [
{
"box": {
"page": 1,
"height": 7.0,
"width": 38.920000000000016,
"left": 470.0,
"top": 143.89
},
"value": "3113805926",
"entity": "invoiceid"
}
]
}
}
The returned object contains specific extractions (a value with some specific semantic property), compound extractions (a group of values with some specific semantic property) as well as candidates (a list of values for some semantic property).
The example response (shortened, on the right) is an invoice (see docType) issued by Deutsche Post AG
with invoice number 3113805926
(see invoiceId).
The receiver of the invoice has to pay 12€
(see amountToPay).
It contains one line item (see lineItem) with an article number 10101
, a tax rate 19%
and an amount 12€
.
Send Feedback and Get Even Better Extractions Next Time
Feedback is an API request containing the correct extractions that you can send us in order to improve the future extraction accuracy of the system. In fact your application should always send at least some feedback. The more complete and qualitative the feedback is, the sooner the extraction system learns what is correct and what's not. Feedback is critical to us and important to you because there is no other way for us know in realtime whether the extraction system is delivering the best possible quality for your application.
In order to inform the system the extraction was correct or incorrect, send back the correct value in the feedback request. It is important that it should be the value exactly as it appears on the actual document (not calculated or inferred). Once the feedback is received it gets compared to the extracted value and the result is used futher in reports and is included into a self-learning mechanism of the Gini extraction system.
Overview of the Gini Accounting API
This section provides general information about the Gini Accounting API. If you want a step-by-step guide how to upload your first document and retrieve its semantic content, have a look at the getting started guide.
IPv6 Compatibility
IPv6 compatibility example
$ host accounting-api.gini.net
accounting-api.gini.net has address 46.245.182.123
accounting-api.gini.net has IPv6 address 2a00:14e0:600:1500:d0c5::7
$ host user.gini.net
user.gini.net has address 46.245.182.124
user.gini.net has IPv6 address 2a00:14e0:600:1500:d0c5::2
Gini Accounting API and User Center are accessible from legacy IPv4 and IPv6 networks. The protocol precedence depends on your operating system and configuration if both protocols are enabled.
Media Types
the media types consumed and produced by the Gini Accounting API look like this
application/vnd.gini.<version>+json
Custom media types are used in the API to let the consumers choose the version of the data format they wish to receive.
This is done by adding one or more of the following media types to the Accept
header when a request is made.
media types are specific to resources, allowing them to change independently and supporting formats that other resources don't.
API Versions
Currently there is one stable version of the Gini Accounting API.
Future versions can be requested using a specific Accept
header.
This is primarily for testing new extractions but may affect other parts of the API as well.
Version 2 (v2)
v2 media type
Accept: application/vnd.gini.v2+json
Gini Accounting API v2 is stable and will remain backwards compatible. Please contact us via api@gini.net if you have any problems.
Developers are strongly encouraged to explicitly specify the required version of the Gini Accounting API using the HTTP/1.1 Accept
header (see the example) because by default the requests are treated as requests to version 1 (v1) of the API.
Authentication
Only authenticated users are allowed to make API requests. The Gini API uses the OAuth 2.0 protocol with bearer tokens for authentication.
In order to use the API in your application, you first have to register your application with Gini. Afterwards your application should request an access token from the Gini Authorization Server and use it to access the Gini Accounting API.
Security
The Gini Accounting API is only accessible over HTTPS. Please make sure that your application validates the relevant X.509 certificates (e.g. common name matches hostname, issuing CA is trusted, etc.).
Client Errors
HTTP response codes
The API uses idiomatic HTTP status codes to indicate if a request was successful or not and whether it should be repeated.
Code | Description |
---|---|
2xx | The request was successful. |
4xx | The request was not successful. See the response body for details. Retrying with the same arguments will not work. |
5xx | Some error occurred while processing the request. Please try again. |
Error Entity
error entity response
{
"message": "Validation of the request entity failed",
"requestId": "8896f9dc-260d-4133-9848-c54e5715270f"
}
In case of an error, the Gini Accounting API always returns a JSON object with further information about the occurred error. The JSON object consists of the following properties:
Name | Type | Description |
---|---|---|
message | string | Human consumable error description (not intended for application end-users) |
requestId | string | Unique ID identifying the request. Please provide this when contacting our support. |
Managing Anonymous Gini Accounts
In order to achieve best results, the Gini Accounting API must be able to track the requests down to individual users for the following reasons:
- Users might have their own opinions about the document type or the correctness of the certain value.
- The feedback correctness is decided based on the number of similar values submitted by the users.
Gini offers various ways to perform requests on behalf of the individual users without requiring physical interaction. Depending on your use case and your product's architecture, the following API authentication methods are available:
trusted
Your application communicates with the Gini Accounting API via a common backend that runs on a trusted device. Each request states which user the request is made for via the application's user identifier. No account management is required.
untrusted
On the first API usage, your application creates an anonymous Gini user in the background and uses those account credentials for subsequent requests. This works best for (mobile) applications where the app communicates directly with the Gini Accounting API on an untrusted device.
Communicating with the Gini Accounting API via Backend / Gateway
request the list of
user1
's documents
curl -v -H 'Accept: application/vnd.gini.v2+json'
-u 'client-id:client-secret'
-H 'X-User-Identifier: user1'
https://accounting-api.gini.net/documents
This authentication scheme is based on HTTP Basic Authentication.
Your application needs to use HTTP Basic Authentication to authenticate itself with the Gini Accounting API.
Additionally, another header called X-User-Identifier
is sent together with the Authorization
header in one request.
This header is used by the API to identify individual users.
Your application is free to choose whatever value it wants for the header, as long as the following constraints are met:
- Each user's identifier must be unique.
- Once set for a user, the identifier must remain the same.
Direct Communication between Client Devices and the Gini Accounting API
Gini offers the User Center API (UC API) to work with the Gini users. Here is a quick step-by-step guide that outlines how to create and use a new anonymous Gini account. Each step links to the corresponding section in the UC API where you can read more details about it.
- obtain the client token
- create a new user
- log in as a new user
- make API requests with the access token
Authenticate the Client
obtain the client token
curl -v -H 'Accept: application/json'
-u 'client-id:client-secret'
'https://user.gini.net/oauth/token?grant_type=client_credentials'
the successful response will have HTTP status
200
and the client access token1eb7ca49-d99f-40cb-b86d-8dd689ca2345
will be returned
{
"access_token":"1eb7ca49-d99f-40cb-b86d-8dd689ca2345",
"token_type":"bearer","expires_in":43199,"scope":"read"
}
Before you are able to use the UC API, you need to obtain a client access token. The client access token authorizes your client (i.e. your application) against the UC API and allows you to create a new user.
At this point it is assumed that you already have the client ID client-id
and the client secret client-secret
.
These will authorize your client (with HTTP Basic Authentication) to obtain the client access token, see the example on the right.
For more details see the corresponding UC API section.
Create a New User
create a new user
curl -v -X POST --data '{"email":"random@example.org", "password":"geheim"}'
-H 'Content-Type: application/json'
-H 'Accept: application/json'
-H 'Authorization: BEARER 1eb7ca49-d99f-40cb-b86d-8dd689ca2345'
'https://user.gini.net/api/users'
the above command creates a new user random@example.org with password geheim. If the creation was successful, the HTTP response has status
201
and contains the Location header pointing to the new user. Your client is now allowed to create a new user authorized by the client access token.
Once the client access token is successfully obtained, it's time to create a new user. To do so we require two more values: a username and a password. The username must be represented by a correct email address whose domain part is easily linkable to your application. For example, if your company is called Example Inc. then app.example.org would be a good domain name to use for your application's user accounts.
For more details see the corresponding UC API section.
Authenticate on behalf of a New User
login as a new user
curl -v -X POST --data-urlencode 'username=random@example.org'
--data-urlencode 'password=geheim'
-H 'Content-Type: application/x-www-form-urlencoded'
-H 'Accept: application/json' -u 'client-id:client-secret'
'https://user.gini.net/oauth/token?grant_type=password'
After the new user is created, you can log in. Note that log in request uses HTTP Basic Authentication with the client ID as a username and with the client secret as a password. It does not require a client access token. The request response will contain an access token that can be used to make API requests on behalf of the new user.
For more details see the corresponding UC API section.
Make API Requests with the Access Token
use the access token you obtained to make API requests
GET /documents HTTP/1.1
Host: accounting-api.gini.net
Authorization: BEARER 760822cb-2dec-4275-8da8-fa8f5680e8d4
Accept: application/vnd.gini.v2+json
Connection: close
In order to make API requests, send the access token as a bearer token in the Authorization
request header.
Documents
As the key aspect of the Gini Accounting API is to provide information extraction for analyzing documents, the API is mainly built around the concept of documents. A document can be any written representation of information such as invoices, reminders, contracts and so on.
The main idea is that you submit a document in the form of an electronic file to Gini. After the document has been analyzed by Gini you can get the information that is extracted from the document by querying the API.
The following documentation explains those actions in more detail.
Submitting Files
documents can be submitted by doing a
POST
request to the/documents
resource.
POST /documents
In order to extract document information, the document source file must be first submitted to Gini.
Submitting documents is as easy as sending a POST request to the /documents
resource path.
After successful submission the location of the new document is returned in the Location
header.
The Gini Accounting API currently supports two different variants of uploads, one optimized for web applications running in a web browser and the other for all other types of clients.
The first variant optimized for web applications expects the documents to be uploaded using a multipart/form-data
encoding method.
The second variant for all other clients simply uses the request body as the document. The content type needs to be compatible for binary upload requests. Please consider that x-www-form-urlencoded
is not suitable for binary data and will be rejected with the HTTP status 400.
Supported File Formats
Gini currently supports document files in PDF, GIF (non-animated), PNG, JPEG, TIFF as well as plain text formats. You can use native documents (PDF only) as well as scanned document (all other supported formats).
Note that there are certain limitations though:
- document file size must be less than 10 MiB
- PDF files must not have any security restrictions such as password protection
- scanned documents should have a resolution of at least 300 dpi in order for the OCR to return optimal results
- image files must have dimensions bigger than 100x100 pixels
- plain text documents have to be encoded in UTF-8 and the source size must be smaller than 512 KiB
- only the first 10 pages of a document are processed
- only document contents in the German language is recognized sufficiently well
The above applies both to single page documents as well as to each page in a multi-page document.
Document Type Hints
In many cases the type of a document is known to the client application.
If you provide the doctype
parameter with a valid type, Gini can optimize the processing of the document in various ways.
Document Uploading Schemes
The Gini API allows you to upload a document as a single file or in parts, page by page.
Upload a Single File Document
This is the standard way of uploading a document to the Gini extraction system.
A PDF document can contain single or multiple pages.
JPEG, PNG documents are also accepted. Please provide a valid Content-Type
header which is suitable for binary uploads. application/octet-stream
is recommended especially if you do not know the exact mime type of the uploaded file. Requests with invalid content types like application/x-www-form-urlencoded
will be rejected.
When it is uploaded, it is processed by the system normally and without any adjustments regarding its structure.
Request
upload a document
variant for web applications running in a web browser with access token:
curl -H 'Authorization: BEARER <token>'
--form 'file=@file.pdf'
-H 'Accept: application/vnd.gini.v2+json'
-i https://accounting-api.gini.net/documents
variant for all other types of applications:
curl -H 'Authorization: BEARER <token>'
--data-binary '@file.pdf'
-H 'Accept: application/vnd.gini.v2+json'
-H 'Content-Type: application/octet-stream'
-i https://accounting-api.gini.net/documents?filename=file.pdf
or with X-User-identifier (see how to setup X-User-identifier):
curl -H 'X-User-Identifier: user1'
--form 'file=@file.pdf'
-H 'Accept: application/vnd.gini.v2+json'
-u 'client-id:client-secret'
-i https://accounting-api.gini.net/documents
Headers
Header | Value |
---|---|
Content-Type |
multipart/form-data; boundary=... |
application/octet-stream |
|
image/jpeg |
|
image/png |
|
application/pdf |
|
Accept |
application/vnd.gini.v2+json |
x-document-metadata-branchid |
some customer specific id |
x-document-metadata-deviceclass |
desktop_device or mobile_device |
The x-document-metadata-*
headers can be used to add some meta information to the upload. This information is used in analytics and reporting.
branchid
is used to group uploads from a specific client (eg branches for banks)deviceclass
specifies from which kind of device the upload comes from. Eg. if it was uploaded from a desktop computer or a smart phone.
Requesting Query Parameters
If the upload is performed without multipart/form-data
you can optionally provide a file name for the submitted document with a query parameter:
Name | Type | Description |
---|---|---|
filename |
string |
(Optional) File name of the submitted document. |
doctype |
string |
(Optional) Type of the submitted document. See document types for possible values. |
Body
Only in case of Content-Type: multipart/form-data
(applications running in a web browser):
Key | Description |
---|---|
Content-Disposition |
form-data |
file |
File contents of document. |
Please make sure that you do not declare a character encoding for the multipart upload - multipart uploads with character encoding will be rejected.
Response
Headers
Status Code | Description |
---|---|
201 (Created) | Operation is successful. |
Header | Value |
---|---|
Content-Type |
application/vnd.gini.v2+json |
Location |
Absolute URI (created document URI). Can be used to check progress and receive document information. |
Errors
Status Code | Description |
---|---|
400 (Bad Request) | Returned when a file is sent in an invalid format or using an invalid content-type. |
401 (Unauthorized) | Authorization credentials are either missing, wrong or outdated. |
415 | Content type is not supported. |
503 | Service unavailable. Please retry later. |
Upload a Document Page by Page
Partial upload should be performed in two steps: a partial documents upload and a composite document upload. Keep in mind that you must complete Step 1 first before moving to Step 2!
Step 1 (Upload Each Page as a Partial Document)
Pages that are part of the document are referred to as partial documents.
If you want to upload a page (or a page picture) that belongs to the document, your request header should additionally include Content-Type
field with application/vnd.gini.v2.partial+png
value or application/vnd.gini.v2.partial+pdf
in case of PDF page.
Request
upload a partial document
variant for web applications running in a web browser with access token:
curl -H 'Authorization: BEARER <token>'
--form 'file=@file.JPEG'
-H 'Accept: application/vnd.gini.v2+json'
-H 'Content-Type: application/vnd.gini.v2.partial+png'
-i https://accounting-api.gini.net/documents
variant for other types of applications with access token:
curl -H 'Authorization: BEARER <token>'
--data-binary '@file.JPEG'
-H 'Accept: application/vnd.gini.v2+json'
-H 'Content-Type: application/vnd.gini.v2.partial+png'
-i https://accounting-api.gini.net/documents?filename=file.JPEG
or with X-User-identifier (see how to setup X-User-identifier):
curl -H 'X-User-Identifier: user1'
--form 'file=@file.JPEG'
-H 'Accept: application/vnd.gini.v2+json'
-H 'Content-Type: application/vnd.gini.v2.partial+png'
-u 'client-id:client-secret'
-i https://accounting-api.gini.net/documents
In order to upload a page picture or a partial document, specify a different content type:
Headers
Header | Value |
---|---|
Content-Type |
application/vnd.gini.v2.partial+png |
application/vnd.gini.v2.partial+pdf |
|
Accept |
application/vnd.gini.v2+json |
Request Query Parameters
Name | Type | Description |
---|---|---|
filename |
string |
(Optional) File name of the submitted document. |
doctype |
string |
(Optional) Type of the submitted document. See document types for possible values. |
Body
File contents of the document.
Response
Headers
Status Code | Description |
---|---|
201 (Created) | Operation is successful. |
Header | Value |
---|---|
Content-Type |
application/vnd.gini.v2+json |
Location |
Absolute URI (created partial document URI) which should be referred by a composite document. |
Errors
Status Code | Description |
---|---|
400 (Bad Request) | Returned when the sent file has invalid format. |
401 (Unauthorized) | Authorization credentials are either missing, wrong or outdated. |
415 | Content type is not supported. |
503 | Service unavailable. Please retry later. |
Step 2 (Upload JSON as a Composite Document)
After successfully uploading all pages as partial documents, you should announce their locations to the extraction system. This is done by uploading a composite document which is a simple JSON file with corresponding locations of partial documents.
Request
upload a composite document
variant for web applications running in a web browser with access token:
curl -H 'Authorization: BEARER <token>'
-H 'Accept: application/vnd.gini.v2+json'
-H 'Content-Type: application/vnd.gini.v2.composite+json'
-X POST -d'{...}'
-i https://accounting-api.gini.net/documents
or
curl -H 'Authorization: BEARER <token>'
--data-binary '@data.json'
-H 'Accept: application/vnd.gini.v2+json'
-H 'Content-Type: application/vnd.gini.v2.composite+json'
-i https://accounting-api.gini.net/documents
or with X-User-identifier (see how to setup X-User-identifier):
curl -H 'X-User-Identifier: user1'
-H 'Accept: application/vnd.gini.v2+json'
-H 'Content-Type: application/vnd.gini.v2.composite+json'
-u 'client-id:client-secret'
-X POST - d'{...}'
-i https://accounting-api.gini.net/documents
with post body of
{
"partialDocuments": [
{
"rotationDelta": 0,
"document": "localtion of parital doc 1"
},
{
"rotationDelta": 0,
"document": "location of partial doc 2"
},
...
]
}
location in form of
https://accounting-api.gini.net/documents/e8606210-56ed-11ea-b823-b351b84ae4b3
In order to upload a composite document which aggregates one or multiple partial documents, a different content type needs to be specified.
Headers
Header | Value |
---|---|
Content-Type |
application/vnd.gini.v2.composite+json |
Accept |
application/vnd.gini.v2+json |
Request Query Parameters
If the upload is performed without multipart/form-data
you can optionally provide a file name for the submitted document with a query parameter:
Name | Type | Description |
---|---|---|
filename |
string |
(Optional) File name of the submitted document. |
Body
Raw bytes of the composite json.
Key | Description |
---|---|
partialDocuments |
A list of partial documents (the location is returned after the partial documents are successfully uploaded). |
Response
Headers
Status Code | Description |
---|---|
201 (Created) | Operation is successful. |
Header | Value |
---|---|
Content-Type |
application/vnd.gini.v2+json |
Location |
Absolute URI of created document (document URI) to check progress and getting document information. |
Errors
Status Code | Description |
---|---|
400 (Bad Request) | Returned when a file in an invalid format is sent |
401 (Unauthorized) | Authorization credentials are either missing, wrong or outdated. |
415 | content type not supported. |
503 | Service unavailable. Please retry later. |
Checking Processing Status and Getting Document Information
document information can be retrieved by sending a
GET
request to the document URI.
GET /documents/{id}
Once the document is submitted and processed you can check its processing status by examining the document information.
It can be retrieved with a GET
request containing the document URI.
When the document has been processed you can retrieve its extractions and layout.
Request
get document processing status
curl -H 'Authorization: BEARER <token>'
-X GET -H 'Accept: application/vnd.gini.v2+json'
-i https://accounting-api.gini.net/documents/c292af40-d06a-11e2-9a2f-000000000000
Headers
Header | Value |
---|---|
Accept |
application/vnd.gini.v2+json |
Response
request response
{
"id": "626626a0-749f-11e2-bfd6-000000000000",
"creationDate": 1360623867402,
"name": "scanned.jpg",
"progress": "COMPLETED",
"origin": "UPLOAD",
"sourceClassification": "SCANNED",
"pageCount": 1,
"_links": {
"extractions": "https://accounting-api.gini.net/documents/626626a0-749f-11e2-bfd6-000000000000/extractions",
"layout": "https://accounting-api.gini.net/documents/626626a0-749f-11e2-bfd6-000000000000/layout",
"document": "https://accounting-api.gini.net/documents/626626a0-749f-11e2-bfd6-000000000000",
"processed": "https://accounting-api.gini.net/documents/626626a0-749f-11e2-bfd6-000000000000/processed"
}
}
Headers
Status Code | Description |
---|---|
200 (OK) | Operation is successful. |
Header | Value |
---|---|
Content-Type |
application/vnd.gini.v2+.json |
Body (application/vnd.gini.v2+json)
Key | Child Key | Type | Description |
---|---|---|---|
id |
string |
Document qnique identifier (such as UUID Version 1). | |
name |
string |
Document name (as stated in upload). | |
pageCount |
number |
Number of pages. | |
creationDate |
number |
Document creation unix timestamp (in milliseconds). | |
origin |
string |
Document source channel: UPLOAD (if uploaded via Gini Accounting API) or UNKNOWN . |
|
progress |
string |
Document processing status: PENDING , COMPLETED or ERROR . |
|
sourceClassification |
string |
Classification of the source file: SCANNED , SANDWICH , NATIVE or TEXT . |
|
pageNumber |
number |
Document page number. | |
images |
object |
Pre-rendered page image URIs. | |
_links |
array |
List of related resources, e.g. found extractions or document layout. | |
extractions |
string |
Document extractions URI | |
layout |
string |
Document layout URI. | |
processed |
string |
Processed document URI. | |
document |
string |
Document URI. |
Errors
Status Code | Description |
---|---|
404 (Not Found) | Returned when no document can be found under specific URI. |
Retrieving Extractions
extractions can be retrieved by performing a
GET
request with the extractions URI:
GET /documents/{id}/extractions
Once the document is processed, the document extractions become available for retrieval. See document extractions for more details.
Request
get extractions
curl -H 'Authorization: BEARER <token>'
-X GET -H 'Accept: application/vnd.gini.v2+json'
-i https://accounting-api.gini.net/documents/c292af40-d06a-11e2-9a2f-000000000000/extractions
Headers
Header | Value |
---|---|
Accept |
application/vnd.gini.v2+json |
Response
example response
{
"extractions": {
"amountToPay": {
"box": {
"height": 9.0,
"left": 516.0,
"page": 1,
"top": 588.0,
"width": 42.0
},
"confidence": 0.715,
"entity": "amount",
"value": "24.99:EUR",
"candidates": "amounts"
}
},
"candidates": {
"amounts": [
{
"box": {
"height": 9.0,
"left": 516.0,
"page": 1,
"top": 588.0,
"width": 42.0
},
"entity": "amount",
"value": "24.99:EUR"
},
{
"box": {
"height": 9.0,
"left": 241.0,
"page": 1,
"top": 588.0,
"width": 42.0
},
"entity": "amount",
"value": "21.0:EUR"
}
]
...
}
}
Headers
Status Code | Description |
---|---|
200 (OK) | Operation is successful. |
Header | Value |
---|---|
Content-Type |
application/vnd.gini.v2+json |
Body (application/vnd.gini.v2+json)
A detailed explanation of the response format can be found in document extractions section.
Name | Type | Description |
---|---|---|
extractions |
object |
Labels to extractions mapping (i.e. specific-extractions). |
candidates |
object |
A mapping of labels to a list of extraction-candidates. |
Errors
Status Code | Description |
---|---|
404 (Not Found) | Requested entity couldn't be found. |
Submitting Feedback on Extractions
You should always submit the feedback on extractions in order to help the system improve its extraction quality.
Gini employs various machine learning techniques in order to learn from feedback automatically. Therefore, it is equally important for Gini to receive both feedback on correct and on incorrect extractions. There are currently two ways to submit the feedback. The first and the most common one is to submit the complete feedback in one request. This is the easiest way if your frontend (application) displays the extractions on a screen in an editable form. A user can modify the extractions before pressing the confirmation button. Another way (a rare use case) is where the final approvement signal (button click) is not possible. In such case you can send the feedback on one label per request.
There are three different types of feedback:
- positive feedback: (Our preferred type) the extraction was correct and confirmed by the user without modification.
- complementary feedback: The extraction system extracted nothing, e.g. the response does not contain the requested label. The user manually entered the correct value.
- negative feedback: The extraction was incomplete or erroneous. The user manually corrected the extraction.
Please see detailed examples next.
Submitting Feedback on Extractions
submitting feedback on extractions
PUT /documents/{id}/extractions
POST /documents/{id}/extractions
The Gini Accounting API allows you to submit the feedback on multiple extractions for a single document with a single request. It is strongly recommended that you submit your feedback in this way for two reasons. On the one hand, the total number of round trips is reduced to one and the feedback is handled internally as a batch. Therefore, the update is more efficient for multiple extractions compared to submitting the feedback with each separate request (see single feedback). On the other hand, Gini's machine learning training techniques can benefit from the feedback on multiple extractions since Gini will be aware of the fact that single parts of the submitted feedback belong together.
Request
Example
We show a more elaborated example here in order to explain different types of the feedback. The example scenario is as follows: the user uploads a document where the labels amountToPay, paymentReference, iban are extracted. Unfortunately the label paymentRecipient could not be extracted. The response to the extractions request is as follows:
{
"candidates": {
},
"extractions": {
"amountToPay": {
"box": {
"height": 8.0,
"left": 545.0,
"page": 1,
"top": 586.0,
"width": 17.0
},
"candidates": "amounts",
"entity": "amount",
"value": "5.60:EUR"
},
"iban": {
"box": {
"height": 7.0,
"left": 447.0,
"page": 1,
"top": 746.0,
"width": 100.0
},
"candidates": "ibans",
"entity": "iban",
"value": "DE68130300000017850360"
},
"paymentReference": {
"entity": "reference",
"value": "ReNr 123, KdNr 32"
}
},
"compoundExtractions": {
"lineItems": [
{
"artNumber": {
"value": "10101",
"entity": "text" ,
"box": {
"height": 7.0,
"left": 55.0,
"page": 1,
"top": 546.0,
"width": 100.0
}
},
"quantity": {
"value": "12",
"entity": "numeric"
}
},
{
"artNumber": {
"value": "10103",
"entity": "text"
},
"quantity": {
"value": "3",
"entity": "numeric"
}
}
],
"taxItems": [
{
"taxRate": {
"value": "19.0%",
"entity": "text"
},
"taxAmount": {
"value": "30.00:EUR",
"entity": "amount"
}
}
]
}
}
The user adds missing paymentRecipient value (complementary feedback) and corrects the paymentReference to "ReNr 1735, KdNr 37" (negative feedback). Corrects one of the line items quantity from 12 to 17. The iban, amountToPay, taxItems and the remaining part of lineItems are correct (positive feedback). The document is not shown, so we can leave out the boxes. Then the resulting feedback request is as follows:
{
"extractions": {
"amountToPay": {
"value": "5.60:EUR"
},
"iban": {
"value": "DE68130300000017850360"
},
"paymentReference": {
"value": "ReNr 1735, KdNr 37"
},
"paymentRecipient": {
"value": "Zalando SE"
}
},
"compoundExtractions": {
"lineItems": [
{
"artNumber": {
"value": "10101"
},
"quantity": {
"value": "17"
}
},
{
"artNumber": {
"value": "10103"
},
"quantity": {
"value": "3"
}
}
],
"taxItems": [
{
"taxRate": {
"value": "19.0%",
"entity": "text"
},
"taxAmount": {
"value": "30.00:EUR",
"entity": "amount"
}
}
]
}
}
Give feedback and correct or verify multiple specific labeled extraction patterns with a single PUT
or POST
request to the document extractions URI.
The labels must correspond to the names of the extraction types e.g. amountToPay
.
See available specific extractions for possible values.
Headers
Header | Value |
---|---|
Content-Type |
application/vnd.gini.v2+json |
Body
Key | Type | Description |
---|---|---|
extractions |
object |
Feedback on atomic extractions |
compoundExtractions |
object |
Feedback on compound extractions |
Response
Status Code | Description |
---|---|
204 (No Content) | The feedback was successfully processed. |
404 (Not Found) | The document or the label could not be found. |
422 (Unprocessable Entity) | At least one value was not valid regarding entity validation rules of the label. |
Submitting Feedback on Invalid Extractions
submitting feedback on invalid extractions
DELETE /documents/{id}/extractions/{label}
Request
In case an extraction was erroneously found (i.e. not present in the source document), you can delete it by issuing a DELETE
request to the extraction URI:
Response
Status Code | Description |
---|---|
204 (No Content) | Label removal was successful. |
404 (Not Found) | The document or the label could not be found. |
Retrieving Document Pages
retrieving document pages
GET /documents/{id}/pages
The Gini Accounting API renders preview images of the document pages.
In order to retrieve a list of pages for a document, issue a GET
request to the pages
sub-resource of the document.
Request
Path Parameters
retrieve document pages
curl -H 'Authorization: BEARER <token>'
-X GET -H 'Accept: application/vnd.gini.v2+json'
-i https://accounting-api.gini.net/documents/c292af40-d06a-11e2-9a2f-000000000000/pages
Name | Value |
---|---|
id |
Document ID |
Headers
Header | Value |
---|---|
Accept |
application/vnd.gini.v2+json |
Response
the response will be a list of pages.
[
{
"images" : {
"1280x1810" : "https://accounting-api.gini.net/documents/c292af40-d06a-11e2-9a2f-000000000000/pages/1/1280x1810",
"750x900" : "https://accounting-api.gini.net/documents/c292af40-d06a-11e2-9a2f-000000000000/pages/1/750x900"
},
"pageNumber" : 1
},
{
"pageNumber" : 2,
"images" : {
"1280x1810" : "https://accounting-api.gini.net/documents/c292af40-d06a-11e2-9a2f-000000000000/pages/2/1280x1810",
"750x900" : "https://accounting-api.gini.net/documents/c292af40-d06a-11e2-9a2f-000000000000/pages/2/750x900"
}
}
]
Headers
Status Code | Description |
---|---|
200 (OK) | The request was successful. |
404 (Not Found) | The requested document does not exist. |
Body
Name | Type | Description |
---|---|---|
pages |
array |
All pages in the current result page. |
A page is an entity with the following fields:
Key | Child key | Type | Description |
---|---|---|---|
documentId |
string |
UUID of the document which the page belongs to. | |
pagenum |
number |
Page number. | |
_links |
object |
Links to related resources. | |
document |
string |
Link to the document to which the page belongs. | |
pages |
string |
Link to the pages of the document. | |
_images |
object |
Links to pre-rendered page images in different resolutions. | |
image resolution in pixels | string |
Link to a pre-rendered image of the page. |
Retrieving the Layout of a Document
The layout of the document describes the textual content of a document with positional information, based on the processed document.
Request
retrieving a layout of the document
GET /documents/{id}/layout
Example
curl -H 'Authorization: BEARER <token>'
-X GET -H 'Accept: application/vnd.gini.v2+json'
-i https://accounting-api.gini.net/documents/c292af40-d06a-11e2-9a2f-000000000000/layout
The layout of the document can be retrieved by a GET
request to the layout URI:
Headers
Header | Value |
---|---|
Accept |
application/vnd.gini.v2+json |
Response
layout example
{
"pages": [
{
"number": 1,
"sizeX": 595.3,
"sizeY": 841.9,
"textZones": [
{
"paragraphs": [
{
"l": 54.0,
"t": 158.76,
"w": 190.1,
"h": 36.55000000000001,
"lines": [
{
"l": 54.0,
"t": 158.76,
"w": 190.1,
"h": 10.810000000000002,
"wds": [
{
"l": 54.0,
"t": 158.76,
"w": 18.129999999999995,
"h": 9.900000000000006,
"fontSize": 9.9,
"fontFamily": "Arial-BoldMT",
"bold":false,
"text": "Ihre"
},
{
"l": 74.86,
"t": 158.76,
"w": 83.91000000000001,
"h": 9.900000000000006,
"fontSize": 9.9,
"fontFamily": "Arial-BoldMT",
"bold":false,
"text": "Vorgangsnummer"
},
{
"l": 158.76,
"t": 158.76,
"w": 3.3000000000000114,
"h": 9.900000000000006,
"fontSize": 9.9,
"fontFamily": "Arial-BoldMT",
"bold":false,
"text": ":"
},
[...]
]
},
[...]
]
}
]
}
],
"regions": [
{
"l": 20.0,
"t": 240.1,
"w": 190.0,
"h": 150.3,
"type": "RemittanceSlip"
},
[...]
]
},
[...]
]
}
Headers
Status Code | Description |
---|---|
200 (OK) | Operation is successful. |
Header | Value |
---|---|
Content-Type |
application/vnd.gini.v2+json |
Body (application/vnd.gini.v2+json)
Key | Type | Description |
---|---|---|
pages |
array |
Array of page objects. |
Page Object
Key | Type | Description |
---|---|---|
number |
number |
Number of the page starting with 1. |
sizeX |
number |
Width of the page. |
sizeY |
number |
Height of the page. |
textZones |
array |
Array of textzone objects. |
regions |
array |
Array of region objects. |
TextZone Object
Key | Type | Description |
---|---|---|
paragraphs |
array |
Array of paragraph objects |
Paragraph Object
Key | Type | Description |
---|---|---|
w |
number |
Width of the paragraph. |
h |
number |
Height of the paragraph. |
t |
number |
Distance of the paragraph from the upper edge of the page. |
l |
number |
Distance of the paragraph from the left edge of the page. |
lines |
array |
Array of line objects. |
Line Object
Key | Type | Description |
---|---|---|
w |
number |
Width of the line. |
h |
number |
Height of the line. |
t |
number |
Distance of the line from the upper edge of the page. |
l |
number |
Distance of the line from the left edge of the page. |
wds |
array |
Array of word objects. |
Word Object
Key | Type | Description |
---|---|---|
h |
number |
Height of the word. |
w |
number |
Width of the word. |
l |
number |
Distance of the word from the left edge of the page. |
t |
number |
Distance of the word from the upper edge of the page. |
fontSize |
number |
Font size of the word in points. |
fontFamily string |
Name of the font family of the word. | |
bold |
boolean |
Indicates bold font style. |
text |
string |
Text of word. |
Region Object
Key | Type | Description |
---|---|---|
h |
number |
Height of the region of interest. |
w |
number |
Width of the region of interest. |
l |
number |
Distance of the region from the left edge of the page. |
t |
number |
Distance of the region from the upper edge of the page. |
type |
string |
Type of the region of interest, e.g. RemittanceSlip. |
Errors
Status Code | Description |
---|
404 (Not Found) The requested layout is invalid.
Retrieving the Processed Document
Request
retrieve the processed document
GET /documents/{id}/processed
Before Gini tries to extract the information, it preprocesses the document, performing page deskewing, homography transformation, etc.
The processed document can be retrieved by a GET
request:
Path parameters
Name | Value |
---|---|
id |
Document ID |
Response
Headers
Status Code | Description |
---|---|
200 (OK) | Operation is successful. |
Body
The version of the uploaded document file after preprocessing (color corrected, deskewed) which has been used for all layout and semantic extractions. In case of native PDF documents it is identical to the original document file.
Errors
Status Code | Description |
---|---|
404 (Not Found) | The requested document does not exist. |
Deleting Documents
delete documents
DELETE /documents/{id}
If you want to delete a document you can do this by sending a DELETE request to the document URI. When the document is deleted all associated resources (extractions, layout) will be deleted as well.
Request
delete request
curl -H 'Authorization: BEARER <token>'
-X DELETE -i https://accounting-api.gini.net/documents/c292af40-d06a-11e2-9a2f-000000000000
Delete the document by sending a DELETE
request to the document URI.
Response
Headers
Status Code | Description |
---|---|
204 (No Content) | Operation is successful. |
Errors
Status Code | Description |
---|---|
404 (Not Found) | Returned when no document can be found under the specific URI. |
Getting a List of All Documents
get a list of all documents
GET /documents
In order to get the list of all documents, send a GET
request to the /documents
resource.
The response will contain a paginated list of all documents.
Request Query Parameters
example request
curl -H 'Authorization: BEARER <token>'
-H 'Accept: application/vnd.gini.v2+json'
-X GET -i https://accounting-api.gini.net/documents?limit=50
Name | Type | Description |
---|---|---|
limit |
number |
(Optional) Maximum number of documents to return (default 20). |
offset |
number |
(Optional) Starting offset (default 0). |
Response
example response
{
"documents": [
{...},
{...},
...
]
}
The response will contain a paginated list of documents.
Headers
Status Code | Description |
---|---|
200 (OK) | Operation is successful. |
Body
The response entity has the following fields:
Name | Type | Description |
---|---|---|
documents |
array |
All documents of the current result page. |
Document Extractions
Structured documents contain a lot of valuable information. For instance, invoices or remittance slips contain amounts to be paid, bank data, receiver, sender, address etc. Doctor prescriptions contain drug amounts, names and descriptions as well as other sensitive information. Receipts contain line-items with corresponding prices and tax values.
The Gini extraction system is able to extract this information and provide it in a structured form accessible through its Accounting API. From now on we shall refer to such data as the document extractions. Some extractions are shared between certain documents e.g. amount to pay, sender, date, line-item and others are quite unique e.g. medical treatment, time of receipt issue, invoice id.
The extraction system however does a little bit more than just information extraction. For example, certain invoices contain blobs of text with due date information without a specific date value or tag but an explanation of when the invoice is supposed to be paid. The Gini system is capable of inferring this data and converting it to the actual date value. We shall refer to this kind of information as specific extractions.
Additionally, Gini system groups various semantically related terms into compound extractions.
For example, IBAN and BIC belong to the single compound bankData
, tax rate and tax amount to taxItems
and a group of items for purchase comprise lineItems
.
Extractions
Extraction
{
"entity": "date",
"value": "2012-06-20",
"box": { ... }
"confidence": 0.997
}
An extraction contains an entity
which describes a general semantic type of the extraction (e.g. a date, an address, an amount).
The entity
also determines the format of the value
containing text information.
There may be an optional box
element describing the position of the extraction value on the document.
We refer to it as the bounding box.
In most cases the extractions without a bounding box are considered to be meta information such as doctype
.
Additionally, we provide the confidence of extraction correctness.
Name | Type | Description |
---|---|---|
entity |
string |
Key (primary identification) of an entity type (e.g. banknumber ). See available extraction entities for possible values. |
value |
string |
A normalized textual representation of the Text/Information provided by the extraction value (e.g. bank number without spaces between the digits). |
box |
bounding-box | (Optional) bounding box containing the position of the extraction value on the document. |
confidence |
float |
Confidence of the extraction being correct. |
Specific Extractions
specific extractions
{
"paymentDueDate": {
"entity": "date",
"value": "2012-06-20",
"box": { ... },
"candidates": "dates"
}
}
A specific extraction assigns a semantic property to the extraction.
It also has an additional candidates
field:
Name | Type | Description |
---|---|---|
candidates |
string |
(Optional) A reference to extraction candidates. See available extraction candidates for possible values. |
Available Specific Extractions
Name | Description | Entity | Candidates |
---|---|---|---|
amountToPay | The amount which yet to be paid. | amount | amounts |
bankAccountNumber | The account number of a payment recipient. | bankaccount | bankAccountNumbers |
bankNumber | The bank number of a payment recipient. | banknumber | bankNumbers |
bic | The bic of a payment recipient. | bic | bics |
branchId | The branch id of a receipt. Note: This extraction is only available if the document is a Receipt. See Document Type Hints for details. | text | n/a |
companyRegisterId | The Commercial Registry number of a document sender. | companyregisterid | companyRegisterIds |
customerId | The customer Id of a document recipient. | customerid | customerIds |
deliveryDate | The delivery date of the invoiced product or service. | date | dates |
docType | The document type of a given document. | doctype | n/a |
documentDate | The document date. | date | dates |
documentTime | The document time. Note: This extraction is only available if the document is a Receipt. See Document Type Hints for details. | time | times |
documentDomain | The domain of a current document. | documentdomain | n/a |
The most probable email address of a sender | emails | ||
grossAmount | The invoiced amount (tax included). | amount | n/a |
iban | The IBAN of a document sender. | iban | ibans |
invoiceId | The invoice Id of a given document. | invoiceid | invoiceIds |
netAmount | The net amount of an invoice. | amount | n/a |
orderNumber | The purchase order number of a given document. | text | orderNumbers |
paymentDueDate | The payment due date of an invoice. Note: Either the due date present on document or calculated with document date + due period | date | dates |
paymentDuePeriod | The payment due period (e.g. of an invoice). Note: The field is not delivered when an exact due date present on document. | timeperiod | periods |
paymentMethod | The payment method of a receipt. Note: This extraction is only available if the document is a Receipt. See Document Type Hints for details. | text | n/a |
paymentPurpose | The extra payment purpose text when the payment reference is not available Note: Currently only available for clients in Austria. | text | n/a |
paymentRecipient | The payment recipient, beneficiary of a money transfer activity | companyname | senderNames |
paymentReference | The payment reference. | reference | n/a |
paymentState | If a document has yet to be paid or is paid already. | paymentstate | n/a |
phoneNumber | The first found phoneNumber in a given document. | phonenumber | phoneNumbers |
receiptNumber | The number of the receipt. Note that this extraction is only available if the document was uploaded with doctype hint Receipt. See Document Type Hints for details. | invoiceid | receiptNumbers |
recipient | The document’s recipient. (Deprecated: use individual recipient subfields) | recipient | n/a |
recipientName | The document’s recipient name. | text | n/a |
recipientNameAddition | The document’s recipient name addition. | text | n/a |
recipientStreet | The document’s recipient street address. | street | n/a |
recipientCity | The document’s recipient city. | city | n/a |
recipientPostalCode | The document’s recipient postal code. | zipcode | n/a |
recipientPoBox | The document’s recipient postal box. | poboxnumber | n/a |
referenceId | The first found reference id in a given document. | text | referenceIds |
senderCity | The sender city. | city | n/a |
senderName | The sender name. | companyname | senderNames |
senderNameAddition | The sender name addition. | companynameaddition | n/a |
senderPoBox | The sender post-office box. | poboxnumber | n/a |
senderPostalCode | The sender’s postal code. | zipcode | n/a |
senderStreet | The sender’s street with house number. | street | n/a |
taxNumber | The tax number of a document sender. | taxnumber | taxnumbers |
templateId | (Optional) The template id when the layout of document meets a certain template (available to clients who choose the template option) | text | n/a |
transactionId | The transaction id of a receipt when it is paid per card. Note: This extraction is only available if the document is a Receipt. See Document Type Hints for details. | text | n/a |
vatRegNumber | The VAT number of a document sender. | vat | vatRegNumbers |
vehiclePlateNumber | the vehicle license plate number on automobile related invoices. | text | vehiclePlateNumbers |
website | The most probable web address of a sender. | url | websites |
Compound Extractions
Compound Extraction describe a group of extractions.
Available Compound Extractions
compound extractions example
{
"compoundExtractions": {
"lineItems": [
{
"sumNet": {
"entity": "amount",
"value": "172.48:EUR",
"box": {
"top": 355.83,
"left": 525.17,
"width": 38.92000000000007,
"height": 10.0,
"page": 1
}
},
"description": {
"entity": "text",
"value": "Zählerstand : 539 kWh 04.03.2019 - 05.04.2019",
"box": {...
}
},
"taxRate": {
"entity": "text",
"value": "19 %",
"box": {...
}
}
},
{
"artNumber": {
"entity": "text",
"value": "kWh",
"box": {...
}
},
"sumNet": {
"entity": "amount",
"value": "128.64:EUR",
"box": {...
}
},
"description": {
"entity": "text",
"value": "Zählerstand : 402 kWh 04.03.2019 - 05.04.2019",
"box": {...
}
}
}
],
"taxItems": [
{
"taxRate": {
"entity": "text",
"value": "19.0 %"
},
"taxAmount": {
"entity": "amount",
"value": "57.21:EUR",
"box": {...
}
}
}
]
}
}
Name | Description | Children | Children Description | Entity |
---|---|---|---|---|
lineItems | Invoice line items describe the details of purchased items. | artNumber | article number | text |
baseGross | gross amount of 1 unit | amount | ||
baseNet | net amount of 1 unit | amount | ||
description | description of the item | text | ||
position | position of the item | text | ||
quantity | quantity in the units of the item | numeric | ||
sumGross | gross amount of all the units of the item | amount | ||
sumNet | net amount of all the units of the item | amount | ||
taxAmount | tax amount of all the units of the item | amount | ||
taxRate | tax rate of the item | text | ||
unit | unit of the item | text | ||
taxItems | Taxes sum and their corresponding rates. | taxRate | tax rate (in percentage) | text |
taxAmount | tax amount of the rate | amount | ||
bankData | iban and bic | iban | iban of one entity | iban |
bic | bic of the same entity | bic |
Extraction Candidates
Extraction candidates represent a list of suggestions for an appropriate extraction.
Available Extraction Candidates
extraction candidates
{
"dates": [
{"entity": "date","value": "2012-06-20","box": { ... } },
{"entity": "date","value": "2012-05-10","box": { ... } },
...
]
}
Name | Description | Entity |
---|---|---|
amounts | All amounts of a given document. | amount |
bankAccountNumbers | All account numbers of a given document. | bankaccount |
bankNumbers | All bank numbers of a given document. | banknumber |
bics | All BICs of a given document. | bic |
companyRegisterIds | All alphanumeric strings (of a similar structure as a German company register id) of a given document. | companyregisterid |
customerIds | All alphanumeric strings (of a similar structure as an identifier) of a given document. | customerid |
dates | All dates of a given document. | date |
emails | All emails of a given document. | |
ibans | All IBANs of a given document. | iban |
invoiceIds | All alphanumeric strings (of similar structure as an identifier) of a given document. | invoiceid |
phoneNumbers | All phone numbers of a given document. | phonenumber |
receiptNumbers | All potential receipt numbers of a given document. | invoiceid |
referenceIds | All potential reference id numbers of a given document. | text |
senderNames | All possible sender names of a given document. | companyname |
taxNumbers | All strings of digits (of a similar structure as a German tax number) of a given document. | taxnumber |
times | All times of a given document. | time |
vatRegNumbers | All alphanumeric strings (of a similar structure as an identifier) of a given document. | vat |
websites | All links found in a given document. | url |
Extraction Entities
Available extraction entities list (follow each link for a detailed description):
- amount
- bankaccount
- banknumber
- bic
- city
- companyname
- companynameaddition
- companyregisterid
- currency
- customerid
- date
- doctype
- documentdomain
- iban
- invoiceid
- numeric
- paymentstate
- phonenumber
- poboxnumber
- recipient
- reference
- street
- taxnumber
- text
- time
- timeperiod
- url
- vat
- zipcode
Bounding Box
Bounding Box
{
"box": {
"page": 2,
"left": 483.0,
"top": 450.0,
"width": 51.0,
"height": 9.0
}
}
A bounding box creates a direct relation between an extraction and a document. The box describes the page and the position where the extraction originates from.
Name | Type | Description |
---|---|---|
left |
number |
The distance from the left edge of the page |
top |
number |
The distance from the top edge of the page |
width |
number |
The horizontal dimension of a box |
height |
number |
The vertical dimension of a box |
page |
number |
The page on which the box can be found, starting with 1 |
Coordinate System
The origin of the coordinate system is adjusted to the upper left corner of the page. The coordinate system uses the DTP point as unit: 1 pt = 1 inch / 72 = 25.4 mm / 72 = 0.3528 mm.
Extraction Confidence
confidence example
{
"entity": "recipient",
"value": "Max Mustermann Musterstrasse 1 Musterstadt",
"box": {
"top": 379.0,
"left": 68.0,
"width": 244.0,
"height": 10.0,
"page": 1
},
"confidence": 0.955
}
We believe it is important to provide the information about how confident our system is of performed extractions.
Therefore, we implemented a mechanism that allows to predict the confidence of document extractions.
The confidence prediction algorithm estimates the chance of delivering the correct extraction based on previous extractions and your feedback.
For example, we introduced an additional JSON field confidence
that is an optional part of each document extraction now.
Due to the nature of the algorithm and the amount of feedback we are receiving, the system cannot deliver a confidence value for every single extraction.
Keep in mind that a portion of document extractions will not have confidence
field.
However, if the field exists we estimate it's reliability between 98%-100%.
For example, if the system returns netAmount
with "confidence": 0.95
there is very little chance that the extraction is incorrect and you can save resources on proofreading.
Entity Reference
amount
amount
{
"entity": "amount",
"value": "33.78:EUR",
"box": {
"page": 1,
"left": 535.0,
"top": 395.0,
"width": 25.0,
"height": 10.0
}
}
Describes an amount of money with a specific currency in the format <Amount>:<Currency Code>
, where <Amount>
is a decimal number with "." as decimal separator and ":" as delimiter between <Amount>
and <Currency Code>
.
The currency code must be given according to the list specified in ISO 4217.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be amount . |
value |
string |
Amount in the defined format. |
box |
bounding-box | Bounding box of the occurrence including the page number. |
Valid Feedback
Form | Example |
---|---|
<Number>:<Currency Code/Symbol> |
12.3:EUR; 12,4:USD; 12.98:USD |
<Number> <Currency Code/Symbol>
(1-space-separation) 12,3 EUR; 12,4 USD; 12 €
<Currency Code/Symbol> <Number>
(1-space-separation) EUR 12.3; \$ 12.4
- If there is no
<Number>
in the string, it will be rejected. - If there is no
<Currency Code/Symbol>
, it will be treated as default 'EUR' (default currency code).
bankaccount
bankaccount
{
"entity": "bankaccount",
"value": "1597880",
"box": {
"page": 1,
"left": 506.0,
"top": 777.0,
"width": 53.0,
"height": 6.0
}
}
Describes a bank account number.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be bankaccount . |
value |
string |
Bank account number in the normalized form (without spaces between digits). |
box |
bounding-box | Bounding box of the occurrence including the page number. |
Valid Feedback
Form | Example |
---|---|
Digits | > 1597880 |
If the string has less than 3 digits, it will be rejected.
banknumber
banknumber
{
"entity": "banknumber",
"value": "70250150",
"box": {
"page": 1,
"left": 147.0,
"top": 427.0,
"width": 52.0,
"height": 8.0
}
}
Describes a bank number.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be banknumber . |
value |
string |
Bank number in the normalized form (without spaces between digits). |
box |
bounding-box | Bounding box of the occurrence including the page number. |
Valid Feedback
Form | Example |
---|---|
8 digits | > 70250150 |
bic
Describes a BIC number.
bic
{
"entity": "bic",
"value": "GENODEF1HH2",
"box": {
"page": 1,
"left": 506.0,
"top": 777.0,
"width": 53.0,
"height": 6.0
}
}
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be bic . |
value |
string |
BIC number in the normalized form (without spaces between digits and letters). |
box |
bounding-box | Bounding box of the occurrence including the page number. |
Valid Feedback
Form | Example |
---|---|
String matching BIC format | GENODEF1HH2 |
city
city
{
"entity": "city",
"value": "München",
"box": {
"page": 1,
"left": 535.0,
"top": 395.0,
"width": 25.0,
"height": 10.0
}
}
Describes a city.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be city . |
value |
string |
The city name. |
box |
bounding-box | Bounding box of the occurrence including the page number. |
companyname
companyname
{
"entity": "companyname",
"value": "Weinquelle Lühmann",
"box": {
"page": 1,
"left": 535.0,
"top": 395.0,
"width": 25.0,
"height": 10.0
}
}
Describes a (sender) company name.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be companyname . |
value |
string |
The company name. |
box |
bounding-box | Bounding box of the occurrence including the page number. |
Valid Feedback
Form | Example |
---|---|
Random string with at least 2 letter/digit characters | O2, BMW, ABC GmbH |
A string with a single letter/digit character will be rejected.
companynameaddition
companynameaddition
{
"entity": "companynameaddition",
"value": "Kundenservice",
"box": {
"page": 1,
"left": 535.0,
"top": 395.0,
"width": 25.0,
"height": 10.0
}
}
Describes a (sender) company name addition (e.g. Kundenservice).
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be companynameaddition . |
value |
string |
The company name addition. |
box |
bounding-box | Bounding box of the occurrence including the page number. |
companyregisterid
companyregisterid
{
"entity": "companyregisterid",
"value": "HRB:108514:München",
"box": {
"page": 1,
"left": 525.0,
"top": 805.0,
"width": 34.0,
"height": 6.0
}
}
Describes a German Register number in the format <Area of the Commercial Registry>:<Number>:<Office of the Registry>
with ":" as a delimiter between the components.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be companyregisterid . |
value |
string |
Register number in the defined format. |
box |
bounding-box | Bounding box of the occurrence including the page number. |
currency
currency
{
"entity": "currency",
"value": "EUR",
"box": {
"page": 1,
"left": 535.0,
"top": 395.0,
"width": 25.0,
"height": 10.0
}
}
Describes the currency of the document.
The currency code must be given according to the list specified in ISO 4217.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be currency . |
value |
string |
Currency in the defined format. |
box |
bounding-box | Bounding box of the occurrence including the page number. |
Valid Feedback
Form | Example |
---|
Euro currency symbol € Euro Currency name/code | EUR, EURO
Since only German documents are supported at the moment, the following strings are allowed: €, EUR, EURO
customerid
customerid
{
"entity": "customerid",
"value": "M500721563",
"box": {
"page": 1,
"left": 317.0,
"top": 123.0,
"width": 158.0,
"height": 8.0
}
}
Describes a customer ID.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be customerid . |
value |
string |
Customer ID in the normalized form (without spaces between digits and letters). |
box |
bounding-box | Bounding box of the occurrence including the page number. |
Valid Feedback
Form | Example |
---|---|
A string with length >= 1 and at least 1 digit | 12345, KD678 |
date
date
{
"entity": "date",
"value": "2012-11-16",
"box": {
"page": 1,
"left": 429.0,
"top": 143.0,
"width": 40.0,
"height": 8.0
}
}
Describes a date in the format <Year>-<Month>-<Day>
with "-" as a delimiter between the date components.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be date |
value |
string |
Date in the defined format |
box |
bounding-box | Bounding box of the occurrence including the page number |
Valid Feedback
Form | Example |
---|---|
yyyy-mm-dd | 2015-10-05 |
German style date | 05.10.2015, 05-10-2015, 05 Okt 2015, 05 Oktober 2015 |
Document Types
doctype
{
"entity": "doctype",
"value": "Invoice"
}
Describes a document type. A list of supported document types:
BankStatement
Contract
CostEstimation
CreditNote
DeliveryNote
Invoice
Receipt
Reminder
RemittanceSlip
TravelExpenseReport
Offer
OrderConfirmation
Other
PurchaseOrder
ZUGFeRDInvoice
- when incoming doc recognized as ZUGFeRDInvoice (E-Invoice of ZUGFeRD standard), information from embedded XML is delivered.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be doctype . |
value |
string |
The document type. |
Valid Feedback
Form | Example |
---|---|
One of the above listed values | Invoice, Reminder |
documentdomain
documentdomain
{
"entity": "documentdomain",
"value": "TeleCommunication"
}
Describes a document domain. A list of supported values:
TeleCommunication
Other
HealthInsurance
Energy
Travel
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be documentdomain . |
value |
string |
The document domain. |
Valid Feedback
Form | Example |
---|---|
One of the above listed values | Travel, HealthInsurance |
{
"entity": "email",
"value": "info@t-online.de",
"box": {
"page": 1,
"left": 189.0,
"top": 820.0,
"width": 73.0,
"height": 7.0
}
}
Describes an email in the format <Name>@<Domain>
with "@" as a delimiter between email components.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be email . |
value |
string |
Email in the defined format. |
box |
bounding-box | Bounding box of the occurrence including the page number. |
Valid Feedback
Form | Example |
---|---|
Valid email address | hello@gini.net |
iban
iban
{
"entity": "iban",
"value": "DE74700500000000028273",
"box": {
"page": 1,
"left": 425.0,
"top": 770.0,
"width": 83.0,
"height": 6.0
}
}
Describes an IBAN.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be iban . |
value |
string |
IBAN in the normalized form (without spaces between digits and letters). |
box |
bounding-box | Bounding box of the occurrence including the page number. |
Valid Feedback
Form | Example |
---|---|
Valid IBAN | DE68700202700667302269 |
Invalid IBAN will be rejected.
invoiceid
invoiceid
{
"entity": "invoiceid",
"value": "201210124056",
"box": {
"page": 1,
"left": 429.0,
"top": 133.0,
"width": 53.0,
"height": 8.0
}
}
Describes an invoice ID as identifier.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be invoiceId . |
value |
string |
Invoice ID in the normalized form (without spaces between digits and letters). |
box |
bounding-box | Bounding box of the occurrence including the page number. |
Valid Feedback
Form | Example |
---|---|
A string with length >= 1 and at least 1 digit | 12345, RE 67890 |
numeric
numeric
{
"entity": "numeric",
"value": "12.5",
"box": {
"page": 1,
"left": 429.0,
"top": 133.0,
"width": 53.0,
"height": 8.0
}
}
Describes a numeric value (integer/float).
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be numeric . |
value |
string |
Numeric value in string. |
box |
bounding-box | Bounding box of the occurrence including the page number. |
Valid Feedback
Form | Example |
---|---|
A string of integer or float | 1, 1.5, 1.234 |
paymentstate
paymentstate
{
"entity": "paymentState",
"value": "Paid"
}
Describes a payment state as one of the following values:
Paid
ToBePaid
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be paymentState . |
value |
string |
The payment state. |
Valid Feedback
Form | Example |
---|---|
One of the above listed values | Paid, ToBePaid |
phonenumber
phonenumber
{
"entity": "phonenumber",
"value": "08923508270",
"box": {
"page": 1,
"left": 425.0,
"top": 770.0,
"width": 83.0,
"height": 6.0
}
}
Describes a phone number in one of the two formats <CountryCode> <Number>
with " " as a delimiter and <Number>
without a country code.
All punctuation marks (e.g. "/", "-"), spaces and "(0)" (e.g. +49(0)61957746361) are deleted.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be phonenumber . |
value |
string |
The phone number in the defined format. |
box |
bounding-box | Bounding box of the occurrence including the page number. |
Valid Feedback
Form | Example |
---|---|
Phonenumber-like string | +49 89 1234 567 |
Brackets, spaces, leading '+', and '-' are allowed.
poboxnumber
poboxnumber
{
"entity": "poboxnumber",
"value": "22087",
"box": {
"page": 1,
"left": 223.0,
"top": 125.0,
"width": 16.0,
"height": 6.0
}
}
Describes a post-office box.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be poboxnumber . |
value |
string |
The post-office box number. |
box |
bounding-box | Bounding box of the occurrence including the page number. |
Valid Feedback
Form | Example |
---|---|
4-6 digits | 123456 |
<Keyword> <4-6 digits> |
Postfach 123456, PF 123456, Brieffach 123456 |
The keyword can be one of the following: Postfach, PF, Brieffach.
recipient
recipient
{
"entity": "recipient",
"value": "Max Mustermann Musterstrasse 1 Musterstadt",
"box": {
"top": 379.0,
"left": 68.0,
"width": 244.0,
"height": 10.0,
"page": 1
}
}
Represents the recipient of a letter.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be recipient . |
value |
string |
The recipient. |
box |
bounding-box | Bounding box of the occurrence including the page number. |
reference
reference
{
"entity": "reference",
"value": "K19218331",
"box": {
"page": 1,
"left": 535.0,
"top": 395.0,
"width": 25.0,
"height": 10.0
}
}
Describes a payment reference.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be reference . |
value |
string |
The payment reference with ", " as delimiter between reference parts. |
box |
bounding-box | Bounding box of the occurrence including the page number. |
Valid Feedback
Form | Example |
---|---|
A string with length >= 5 | This a reference. |
A string with less than 5 non space characters will be rejected.
street
street
{
"entity": "street",
"value": "Emmy-Noether-Straße:2a",
"box": {
"page": 1,
"left": 162.0,
"top": 125.0,
"width": 55.0,
"height": 6.0
}
}
Describes a street in the format <Street name>:<House number>
with ":"
as a delimiter between components. All abbreviations (e.g. "str.") are
replaced with the German word "Straße".
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be street . |
value |
string |
Street in the defined format. |
box |
bounding-box | Bounding box of the occurrence including the page number. |
Valid Feedback
Form | Example |
---|---|
<Streetname>:<Housenumber> |
ABC Str:1a |
<Streetname> <Housenumber> |
ABC Straße 1a |
<Streetname> (without house number) |
ABC Straße |
taxnumber
taxnumber
{
"entity": "taxnumber",
"value": "143/163/40289",
"box": {
"page": 1,
"left": 501.0,
"top": 812.0,
"width": 58.0,
"height": 6.0
}
}
Describes a German tax number in the format <taxOfficeNumber>/<taxOfficeAreaNumber>/<personalNumber><checkDigit>
with "/" as delimiter between the first 3 components.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be taxnumber . |
value |
string |
Tax number in the defined format. |
box |
bounding box Bounding box of the occurrence including the page number. |
Valid Feedback
Form | Example |
---|---|
A string containing 11-13 digits | 143/163/4028/9 |
All kinds of delimiters that are common for tax numbers are allowed (e.g. '/', '-'). A string without delimiters is also allowed.
text
text
{
"entity": "text",
"value": "Aktenzeichen: K19218331",
"box": {
"page": 1,
"left": 535.0,
"top": 395.0,
"width": 25.0,
"height": 10.0
}
}
Describes a plain text entity.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be text . |
value |
string |
Plain text. |
box |
bounding-box | Bounding box of the occurrence including the page number. |
Valid Feedback
Generally Text entity accepts all kinds of Text as feedback except for some specific fields which the extra rules are applied to.
Label | Form | Example |
---|---|---|
branchId | digit sequence | 12345, 678901234 |
transactionId | ||
paymentMethod | one of the allowed valid payment methods | Cash, Card, Contactless Card, Girocard, Contactless Girocard, Contactless Visa, Contactless Mastercard |
time
time
{
"entity": "time",
"value": "12:13:14",
"box": {
"page": 1,
"left": 429.0,
"top": 143.0,
"width": 40.0,
"height": 8.0
}
}
Describes a time in the format <hour>:<minute>:<second>
with ":" as a delimiter between the time components.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be time . |
value |
string |
Time in the defined format. |
box |
bounding-box | Bounding box of the occurrence including the page number. |
timeperiod
timeperiod
{
"entity": "timeperiod",
"value": "14:day",
"box": {
"page": 1,
"left": 429.0,
"top": 143.0,
"width": 40.0,
"height": 8.0
}
}
Describes a time range in the format <number>:<time unit>
with ":" as a delimiter between number and unit.
Currently following time units are supported: day, workday, week, month, quarter, year
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be timeperiod . |
value |
string |
Time in the defined format. |
box |
bounding-box | Bounding box of the occurrence including the page number. |
Valid Feedback
Form | Example |
---|---|
A string in form of "n:time_unit" | "3:day", "2:week", "1:quarter","2:year" |
url
url
{
"entity": "url",
"value": "www.m-net.de",
"box": {
"page": 1,
"left": 444.0,
"top": 553.0,
"width": 50.0,
"height": 8.0
}
}
Describes the host part of an URI as defined in RFC 3986.
http://
is implicitly assumed as URI scheme.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be url . |
value |
string |
The host part of a URI. |
box |
bounding-box | Bounding box of the occurrence including the page number. |
Valid Feedback
Form | Example |
---|---|
A valid url string | www.gini.net |
vat
vat
{
"entity": "vat",
"value": "DE188796931",
"box": {
"page": 1,
"left": 453.0,
"top": 812.0,
"width": 43.0,
"height": 6.0
}
}
Describes a EU VAT number.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be vat . |
value |
string |
European Union VAT number in normalized form (without spaces between the digits and letters). |
box |
bounding-box | Bounding box of the occurrence including the page number. |
Valid Feedback
Form | Example |
---|---|
VAT string in valid form | DE188796931 |
Currently only VAT for 'DE' 'GB' 'FR' 'AT' allowed.
zipcode
zipcode
{
"entity": "zipCode",
"value": "18337",
"box": {
"page": 1,
"left": 62.0,
"top": 25.0,
"width": 55.0,
"height": 6.0
}
}
Describes a ZIP code.
Format
Name | Type | Description |
---|---|---|
entity |
string |
Must be zipCode . |
value |
string |
The ZIP code. |
box |
bounding-box | Bounding box of the occurrence including the page number. |
Valid Feedback
Form | Example |
---|---|
4-5 digits | 80809 |
User Center API
Gini's User Center offers an API to programmatically create new Gini accounts and to make API requests on behalf of the created user.
Client Authentication
All access to the User Center API requires client authentication. A client can authenticate itself with the Client Credentials Grant described in RFC 6749. In short, the client exchanges its client ID and client secret for an access token.
Request
get a client access token
curl -v -H 'Accept: application/json'
-u 'client-id:secret'
'https://user.gini.net/oauth/token?grant_type=client_credentials'
GET /oauth/token?grant_type=client_credentials HTTP/1.1
Authorization: Basic Y2xpZW50LWlkOnNlY3JldA==
Host: user.gini.net
Accept: application/json
example response
{
"access_token":"74c1e7fe-e464-451f-a6eb-8f0998c46ff6","token_type":"bearer","expires_in":3599
}
In order to get a client access token, send a GET
request to /oauth/token?grant_type=client_credentials
.
The request must contain a basic HTTP access authorization header with the client ID as a username and the client secret as a password.
The client can now use the returned access token to make requests to the User Center API by sending the token as a bearer token in the
Authorization
request header:
GET /api/users/c1e60c6b-a0a4-4d80-81eb-c1c6de729a0e HTTP/1.1
Host: user.gini.net
Authorization: BEARER 74c1e7fe-e464-451f-a6eb-8f0998c46ff6
Accept: application/json
Authenticating on behalf of a User
authenticating on behalf of a user
curl -v -X POST --data-urlencode
'username=some_user@example.com'
--data-urlencode 'password=supersecret'
-H 'Content-Type: application/x-www-form-urlencoded'
-H 'Accept: application/json'
-u 'client-id:secret' 'https://user.gini.net/oauth/token?grant_type=password'
POST /oauth/token?grant_type=password HTTP/1.1
Authorization: Basic Y2xpZW50LWlkOnNlY3JldA==
Host: user.gini.net
Accept: application/json
Content-Type: application/x-www-form-urlencoded
username=some_user@example.com&password=supersecret
example response
{
"access_token":"6c470ffa-abf1-41aa-b866-cd3be0ee84f4",
"token_type":"bearer",
"expires_in":3599
}
The returned access token can now be used to make requests to the Gini Accounting API on behalf of the user. To do so, send the access token as a bearer token in the
Authorization
request header:
GET /documents HTTP/1.1
Host: accounting-api.gini.net
Authorization: BEARER 6c470ffa-abf1-41aa-b866-cd3be0ee84f4
Accept: application/vnd.gini.v2+json
Connection: close
The Resource Owner Password Credentials Grant can be used to exchange a user's email address and a password with an access token. The access token can then be used to make requests to the Gini API on behalf of the user.
Request
Key | Description |
---|---|
username |
The user's email address. |
password |
The user's password. |
Note that the client must authenticate itself using basic HTTP access authentication with its ID as a username and its secret as a password.
Creating a New User
creating a new user
curl -v -X POST --data '{"email":"some_user@example.com", "password":"supersecret"}'
-H 'Content-Type: application/json'
-H 'Accept: application/json'
-H 'Authorization: BEARER 74c1e7fe-e464-451f-a6eb-8f0998c46ff6'
'https://user.gini.net/api/users'
POST /api/users HTTP/1.1
Host: user.gini.net
Authorization: BEARER 74c1e7fe-e464-451f-a6eb-8f0998c46ff6
Content-Type: application/json
{"email":"some_user@example.com","password:"supersecret"}
example response
HTTP/1.1 201 Created
Location: https://user.gini.net/api/users/c1e60c6b-a0a4-4d80-81eb-c1c6de729a0e
Content-Length: 0
In order to create a new user, submit a POST
request to /api/users
.
Request
Key | Description |
---|---|
email |
The new user's email address (will be used as login username). |
password |
The new user's password (must be at least 6 characters long). |
If the request entity was invalid (missing field(s), password < 6 characters etc.) or a user with that email address already exists, the API will respond with 400 Bad Request
.
Retrieving User Information
retrieving user information
GET /api/users/88a28076-18e8-4275-b39c-eaacc240d406 HTTP/1.1
Host: user.gini.net
Authorization: BEARER 74c1e7fe-e464-451f-a6eb-8f0998c46ff6
Accept: application/json
Response
{
"id":"88a28076-18e8-4275-b39c-eaacc240d406",
"email":"some_user@example.com"
}
Information about a user can be retrieved with a GET
request to /api/users/{userId}
Response
Key | Description |
---|---|
id |
Unique User ID. |
email |
The user's email address. |
Changing a User's Password and/or Email
change a user's password and/or email
PUT /api/users/c1e60c6b-a0a4-4d80-81eb-c1c6de729a0e HTTP/1.1
Host: user.gini.net
Authorization: BEARER 74c1e7fe-e464-451f-a6eb-8f0998c46ff6
Content-Type: application/json
with
{"oldPassword":"supersecret","password:"anothersecret"}
or
{"oldEmail":"old@email.com","email:"my.new@email.com"}
or
{
"oldPassword":"supersecret",
"password":"anothersecret",
"oldEmail":"old@email.com",
"email":"my.new@email.com"}
A user's password and/or email can be changed with a PUT
request to /api/users/{userId}
.
In order to update a user's password and/or email, the current password/email must be provided.
Request
Key | Description |
---|
oldPassword
The user's current password.
password
| The password to which the user's password should be changed to.
oldEmail
| The user's current email.
email
| The email to which the user's email should be changed to.
Deleting a User
delete a user
DELETE /api/users/16aecc72-8032-4df6-9686-eaf4ec9532b8 HTTP/1.1
Host: user.gini.net
Authorization: BEARER 74c1e7fe-e464-451f-a6eb-8f0998c46ff6
Content-Type: application/json
An existing user can be deleted with a DELETE
request to /api/users/{userId}
.
This also deletes all data associated with that user (e.g. access tokens, documents and extractions).
Troubleshooting
If you have trouble using the Gini Accounting API and you need to contact the support, there is some information you should always provide in order for us to help you quickly and efficiently.
X-Request-Id
The request id is generated for every request against the Gini Accounting API and tracked through the whole system.
It is included in every response you receive from the Gini Accounting API as the HTTP header X-Request-Id
.
Please refer to it when you contact our support.
Document Id
The document id is generated for every accepted upload.
It is included in the Location
HTTP header which is part of the response to a successful upload.
Please refer to the document id if you have questions related to the specific document upload.