Adding structured fiscal data

At key budget events like the tabling of the Main Appropriation and Adjustments Appropriation, and release of Annual Report Expenditure data, we upload structured data OpenSpending and make it accessible via CKAN and vulekamali. This data is intended to support further analysis by the public, and is used for summaries and demonstration of the capability of the data on vulekamali.

To allow automated summaries of the data in vulekamali, the Data Manager has very specific expectations of the structure and values in the data, and how the Fiscal Data Package types used when uploading the data to OpenSpending.

The consequence of not adhering to these requirements is that the summaries and demonstrations on vulekamali will not work, and will either be broken, or warn that data for the relevant demonstration is missing.

Updated datasets

It is strongly recommended that updated datasets are uploaded to new unique identifieds, and that the previous version of the dataset is not replaced or deleted. Replacing or deleting datasets can break scripted analysis, or produce different results if someone's instructions or script is repeated. This leads to doubt about the reliability of the data.

Dataset timestamps

It is strongly recommended that upload timestamps of the form 2019-08-12 01:18 (and 2019-08-01t0118 for unique identifiers) are included in the name and unique identified of datasets in OpenSpending. This facilitates better referencing of data in analyses and articles, and helps others know exactly which version of a dataset was used which produced a particular result.

General requirements for data uploaded to OpenSpending

  • At least one "measure" is needed, e.g. the value column - named Value

    • The value column must be in Rands, not Thousands of Rands

    • The decimal separator must be period (.) and not a comma.

    • There must be no thousand separator.

  • The combination of the non-measure columns must be unique on each row

  • Financial year must be an integer - for National and Provincial budget data we use the starting year, as is convention. So for 2018-19, we use 2018.

  • Department names and budget phases must match what is used in the Data Manager precisely. That includes capitalisation and punctuation, including hyphenation and commas. Avoid stray spaces at the beginning and end of values. See dataset specifics below.

  • Text and numeric values must be consistent. Occasional inconsistencies like stray spaces at the start or end of a value result in that category being treated as a different category, just like it would in a pivot table.

After adding the dataset to CKAN, add it to the right group and add the right metadata so that the Data Manager can find each dataset to prepare the summaries for each financial year using the correct dataset.

Automated data checks

Data sets can be checked for adherence to the requirements listed in the Specific dataset requirements section below on the vulekamali automated data checks repository.

Instruction Video

Step-by-step guide

  1. Log in or sign up for github.com

  2. Make sure your data is available on a publicly accessible url in "raw" format, e.g. uploaded to https://s3.console.aws.amazon.com/s3/buckets/manual-uploads.vulekamali.gov.za/?region=eu-west-1&tab=overview

  3. Click on "Create new file"

  1. Type the file name as datapackages/<financial year>/<dataset type>/datapackage.json, for example datapackages/2019-20/epre/datapackage.json

  1. Copy the following text to the new file:

{
 "name": "<INSERT NAME OF YOUR DATA SET>",
 "schema": "tabular-data-package",
 "profile": "tabular-data-package",
 "resources": [
     {
         "path": "<INSERT URL TO YOUR DATA SET>",
         "profile": "tabular-data-resource",
         "name": "<INSERT NAME OF YOUR DATA SET>",
         "format": "csv",
         "mediatype": "text/csv",
         "encoding": "utf-8",
         "schema": "https://raw.githubusercontent.com/vulekamali/data-checks/master/schema/<INSERT TYPE OF YOUR DATA SET>.json"
     }
 ]
}
  1. Update the name value and the path, name and schema values inside the resources value in your new file.

  1. Scroll down to the "Commit new file" heading. Select the "Create a new branch for this commit and start a pull request." option and click on "Propose new file".

  1. On the next screen, give your pull request a descriptive title such as "Add 2019-20 EPRE data" and click on "Create pull request"

  1. You should now see a section with yellow text that reads: "Some checks haven’t completed yet". Wait until the text turns either red or green.

Verify that your new datapackage is being checked

When adding a datapackage, verify that it's being checked. If you placed it in the wrong directory, it will not be checked and you might falsely think that your dataset was checked and passed the checks.

To verify that your package is being checked, click on Details for the checks and look for Checking ...your package path in the checks log. If you don't see your packing being checked, double-check that the path to your datapackage is correct so that it can be picked up for checking automatically. If you don't see your datapackage path being checked in the log, your package is not being checked.

The following shows a successful check for the 2018-19 ENE package:

The following shows an error for the 2019-20 ENE package:

  • If you see a section with green text reading "All checks have passed", your data set has passed all of the checks successfully:

  • If you see a section with red text reading "All checks have failed", your data set did not pass all of the checks:

  • If your data set did not pass all of the checks, you can click on "details" on the first item under "All checks have failed". On the next screen, scroll down to the bottom of the screen. You should see a list of error messages generated for your data set.

Specific dataset requirements

Budgeted and Actual National Expenditure

This dataset consists of all of the following datasets, for each financial year available:

  • Estimates of National Expenditure (only for rows where Budget Phase is equal to Main appropriation

  • Adjusted Estimates of National Expenditure (only for rows where Budget Phase is equal to Adjusted appropriation)

  • Audit Outcomes dataset (only for rows where Budget Phase is equal to Audit Outcome or Final Appropriation)

When each of the above datasets are released, the new year's data is added to this dataset. Only the rows for where Financial Year is equal to the ENE, AENE or AR dataset's year is included in this combined dataset (e.g. for ENE 2015-16 adds rows where Financial Year is equal to 2015 and Budget Phase is equal to Main Appropriation. AENE 2019-20 adds rows where FinancialYear is 2019 and Budget Phase is Adjusted appropriation).

Instead of adding a new dataset to CKAN each year, just update the existing dataset to the new links in OpenSpending.

CKAN Metadata

CKAN Resources

In this order:

  • Excel file of the data, ideally with the following pivot table in the default worksheet:

    • Filters: AmountKind=Total, Government=South Africa

    • Rows: Department

    • Columns: FinancialYear, BudgetPhase

    • Measures: Amount

  • CSV file of the data - we usually use the CSV hosted by OpenSpending

  • OpenSpending dataset model URL

Fields:

For example, as CSV

Budget Phase,Department,Econ1,Econ2,Econ3,Econ4,Financial Year,Function Group 1,Programme,Programme Number,Subprogramme,Subprogramme Number,Value,Vote Number,Amount Kind,Government
Main appropriation,Public Service and Administration,Current payments,Compensation of employees,Salaries and wages,Salaries and wages,2018,General public services,Administration,1,Ministry,1,31989000.0,10,Total,South Africa
Main appropriation,Public Service and Administration,Current payments,Compensation of employees,Social contributions,Social contributions,2018,General public services,Administration,1,Ministry,1,2213000.0,10,Total,South Africa
Main appropriation,Public Service and Administration,Current payments,Goods and services,Administrative fees,Administrative fees,2018,General public services,Administration,1,Ministry,1,393000.0,10,Total,South Africa
Main appropriation,Public Service and Administration,Current payments,Goods and services,Advertising,Advertising,2018,General public services,Administration,1,Ministry,1,27000.0,10,Total,South Africa

Example pivot table, with the following parameters:

  • Filter: Financial Year = 2017

  • Columns: Budget Phase

  • Rows: Amount Kind

  • Values: Amount

Budgeted and Actual Provincial Expenditure

Same as Budgeted and Actual National Expenditure, but the Governments in this case are the provincial governments. The data sources are then the EPRE, AEPRE and Provincial Expenditure Audit Outcomes datasets.

OpenSpending Metadata

  • Human-readable name: Budgeted and Actual Provincial Expenditure uploaded 2019-08-12 01:09

    • with appropriate upload timestamp

  • Unique identifier: budgeted-and-actual-provincial-expenditure-uploaded-2019-08-12t0109

  • You can skip the description, city and period fields.

CKAN Metadata

CKAN Resources

In this order:

  • Excel file of the data, ideally with the following pivot table in the default worksheet:

    • Filters: AmountKind=Total

    • Rows: Government, Department

    • Columns: FinancialYear, BudgetPhase

    • Measures: Amount

  • CSV file of the data - we usually use the CSV hosted by OpenSpending

  • OpenSpending dataset model URL

Fields:

For example, as CSV

Budget Phase,Department,Econ1,Econ2,Econ3,Econ4,Financial Year,Function Group 1,Programme,Programme Number,Subprogramme,Subprogramme Number,Value,Vote Number,Amount Kind,Government
Main appropriation,Public Service and Administration,Current payments,Compensation of employees,Salaries and wages,Salaries and wages,2018,General public services,Administration,1,Ministry,1,31989000.0,10,Total,North West
Main appropriation,Public Service and Administration,Current payments,Compensation of employees,Social contributions,Social contributions,2018,General public services,Administration,1,Ministry,1,2213000.0,10,Total,Eastern Cape
Main appropriation,Public Service and Administration,Current payments,Goods and services,Administrative fees,Administrative fees,2018,General public services,Administration,1,Ministry,1,393000.0,10,Total,Western Cape
Main appropriation,Public Service and Administration,Current payments,Goods a

Estimates of National Expenditure

OpenSpending Metadata

  • Human-readable name: Estimates of National Expenditure of South Africa 2019-20 uploaded 2019-06-05 16:15

    • with appropriate financial year and upload timestamp

  • Unique Identifier: estimates-of-national-expenditure-south-africa-2019-20-uploaded-2019-06-05-1615

  • Description: This dataset includes expenditure data as published in the Estimates of National Expenditure. Departmental expenditure data in previous financial years have been aligned with the budget and programme structures of departments as they are structured in the current Medium Term Expenditure Framework. This means that the structure of historical expenditure data for a department, may not necessarily reflect the same structure of that department when expenditure was incurred during a particular financial year. This may be due to functions shifts which may have occurred within a particular department or a general change in a department’s budget programme structures (BPS) as departments are afforded an opportunity before the start of the Budget to change their BPS.This is to ensure a more economical, efficient and effective delivery of public services and performance by the department as a whole.

  • You can skip the City and Period fields

CKAN Metadata

  • Dataset Name: Estimates of National Expenditure 2019-20

    • or whatver the correct tabled financial year is

  • Financial Years: Exactly one: the year being tabled

  • Sphere: national

  • Dimensions: as per the fields available

CKAN Resources

In this order:

  • Excel file of the data, ideally with the following pivot table in the default worksheet:

    • Rows: Department

    • Columns: FinancialYear, BudgetPhase

    • Measures: Amount

  • CSV file of the data - we usually use the CSV hosted by OpenSpending

  • OpenSpending dataset model URL

Fields:

Examples

CSV example as uploaded to OpenSpending:

BudgetPhase,Department,EconomicClassification1,EconomicClassification2,EconomicClassification3,EconomicClassification4,FinancialYear,Programme,ProgNumber,Subprogramme,SubprogNumber,Value,VoteNumber
Audited Outcome,Public Service and Administration,Capital,Payments for capital assets,Machinery and equipment,Other machinery and equipment,2011,Administration,1,Ministry,1,311000.0,10
Audited Outcome,Public Service and Administration,Capital,Payments for capital assets,Machinery and equipment,Other machinery and equipment,2012,Administration,1,Ministry,1,112000.0,10
Adjusted appropriation,Public Service and Administration,Capital,Payments for capital assets,Machinery and equipment,Other machinery and equipment,2013,Administration,1,Ministry,1,101000.0,10

Estimates of Provincial Expenditure

Note this dataset is called Estimates of Provincial Expenditure, not Estimates of Provincial Revenue and Expenditure because it only contains expenditure data. While it might throw those who know what the EPRE is, it tries to make sense to users of expenditure data and not leave users wondering where the revenue data is if it was called EPRE.

Example to check correctness

Once prepared according to the specifications below, you can produce the following pivot table and quickly check

  • Columns: FinancialYear

  • Rows: BudgetPhase

  • Value: Value

Checklist:

  1. are the correct budget phases available for the correct years?

  2. are the budget phases capitalised correctly?

  3. are the total for each phase for each year similar in size? Of course they will differ a bit

  4. are all the years available?

OpenSpending Metadata

  • Human-readable name: Estimates of Provincial Expenditure of South Africa 2019-20 uploaded 2019-06-05 16:15

    • with appropriate financial year and upload timestamp

  • Unique Identifier: estimates-of-provincial-expenditure-south-africa-2019-20-uploaded-2019-06-05-1615

  • Description: This dataset includes expenditure data as published in the Estimates of Provincial Revenue and Expenditure. Departmental expenditure data in previous financial years have been aligned with the budget and programme structures of departments as they are structured in the current Medium Term Expenditure Framework. This means that the structure of historical expenditure data for a department, may not necessarily reflect the same structure of that department when expenditure was incurred during a particular financial year. This may be due to functions shifts which may have occurred within a particular department or a general change in a department’s budget programme structures (BPS) as departments are afforded an opportunity before the start of the Budget to change their BPS.This is to ensure a more economical, efficient and effective delivery of public services and performance by the department as a whole.

  • You can skip the City and Period fields

CKAN Metadata

  • Group: Estimates of Provincial Expenditure

  • Financial Years: Exactly one: the year being tabled

  • Sphere: provincial

  • Dimensions: as per the fields available

CKAN Resources

In this order:

  • Excel file of the data, ideally with the following pivot table in the default worksheet:

    • Rows: Government, Department

    • Columns: FinancialYear, BudgetPhase

    • Measures: Amount

  • CSV file of the data - we usually use the CSV hosted by OpenSpending

  • OpenSpending dataset model URL

Dataset columns and value requirements

Adjusted Estimates of National Expenditure

CKAN Metadata

  • Group: Adjusted Estimates of National Expenditure

  • Sphere: National

  • Dimensions: As per fields below

CKAN Resources

In this order:

  • Excel file of the data

  • CSV file of the data - we usually use the CSV URL hosted by OpenSpending

  • Openspending dataset model URL

Fields

Examples

Example pivot table with the following parameters:

  • Filter: none

  • Columns: Budget Phase; Amount Kind

  • Rows: Department

  • Values: Amount (sum)

National Expenditure Audit Outcomes data

OpenSpending Metadata

  • Human-readable name: National Department Expenditure Audit Outcomes of South Africa 2019-20 uploaded 2019-06-05 16:15

    • with appropriate financial year and upload timestamp

  • Unique Identifier: national-expenditure-audit-outcomes-south-africa-2019-20-uploaded-2019-06-05-1615

  • You can skip the Description, City and Period fields

CKAN Metadata

  • Title: National Department Expenditure Audit Outcomes 2019-20

    • if 2019-20 is the audit year being released

  • Group: Expenditure Audit Outcomes Data

  • Financial Years: Exactly one: the audit year being released

  • Sphere: national

Dataset columns and value requirements

Provincial Expenditure Audit Outcomes

OpenSpending Metadata

  • Human-readable name: Provincial Expenditure Audit Outcomes of South Africa 2019-20 uploaded 2019-06-05 16:15

    • with appropriate financial year and upload timestamp

  • Unique Identifier: provincial-expenditure-audit-outcomes-south-africa-2019-20-uploaded-2019-06-05-1615

  • You can skip the Description, City and Period fields

CKAN Metadata

  • Title: Provincial Expenditure Audit Outcomes 2019-20

    • if 2019-20 is the audit year being released

  • Group: Expenditure Audit Outcomes Data

  • Financial Years: Exactly one: the audit year being released

  • Sphere: provincial

Dataset columns and value requirements

Consolidated budget

Objective

OpenSpending Metadata

  • Human-readable name: Consolidated Expenditure Budget of South Africa 2019-20 uploaded 2019-06-05 16:15

    • with appropriate financial year and upload timestamp

  • Unique Identifier: consolidated-expenditure-south-africa-2019-20-uploaded-2019-06-05-1615

  • You can skip the Description, City and Period fields

CKAN Metadata

  • Title: Consolidated Expenditure 2019-20

    • if 2019-20 is the year being tabled

  • Group: Consolidated Expenditure

  • Financial Years: Exactly one: the financial year being tabled

  • Sphere: national and provincial

Dataset columns and value requirements

Adding a dataset to OpenSpending

Before adding a dataset to OpenSpending, it has to be cleaned and structured correctly and must meet the general and specific requirements above.

Any data transformation tools can be used to ensure the data meets the above requirements. We have been using Datapackage Pipelines

Uploading data to OpenSpending using OS Packager

Login to the vulekamali account on OS Packager and follow the upload wizard.

Assign types to columns

This can be done per-column based on the table of column types shown above, or by uploading the data-checks schema file for the dataset (if available), which includes column type mappings.

Enter the dataset metadata as per the specifications for the dataset.

Keep the upload tab open to be able to monitor progress if possible. This takes 20 minutes to an hour.

When the dataset is successfully loaded, it's ready to be added to CKAN:

Adding OpenSpending datasets to CKAN

Create a dataset with the following resources, and metadata as per the specification for the dataset.

Resource: OpenSpending API for programmatic access

Add the OpenSpending API Model URL as a resource of the dataset.

The model URL can be constructed by entering the dataset ID in OpenSpending in the following template:

https://openspending-dedicated.vulekamali.gov.za/api/3/cubes/...Dataset ID.../model/

The Dataset ID is the combination of the OpenSpending account ID (owner in the Fiscal Data Package JSON file), and the Fiscal Data Package name, with a colon in between.

If the Viewer URL is

https://openspending-dedicated.vulekamali.gov.za/viewer/b9d2af843f3a7ca223eea07fb608e62a:estimates-of-provincial-expenditure-of-south-africa-2021-22-uploaded-2021-05-28-1228?lang=en

the dataset ID is

b9d2af843f3a7ca223eea07fb608e62a:estimates-of-provincial-expenditure-of-south-africa-2021-22-uploaded-2021-05-28-1228

and thus the Model URL is

https://openspending-dedicated.vulekamali.gov.za/api/3/cubes/b9d2af843f3a7ca223eea07fb608e62a:estimates-of-provincial-expenditure-of-south-africa-2021-22-uploaded-2021-05-28-1228/model/

OpenSpending metadata:

  • Resource name: Same as dataset title

  • Format: OpenSpending API

It's important that the capitalisation, spelling and spacing of this is correct: OpenSpending API

CSV file for manual and programmatic analysis

Add the CSV link to download the full dataset as Comma Separated Variable data which can be opened in Excel and other common data analysis tools.

Find the link at the bottom of the dataset viewer page in OpenSpending - in this case aene-2016-17

Copy this link and add it as a CSV type resource of the dataset

These will then show up in vulekamali as a Dataset and in data summaries and demonstrations.

Troubleshooting

OpenSpending

Integrity Error: duplicate key value violates unique constraint

This is caused by multiple rows in the uploaded dataset having the exact same values across all dimensions (also known as classifying columns).

Remember each row must be uniquely classified by the dimension columns.

The fix for this is usually to sum together the rows with the same classification (if you're going from more granular data to this set). This can for example be done with a pivot table that sums on the Value column.

KeyError: 'primaryKey'

We've seen this when including a column as "Unknown - String" type. Removing the column from the dataset and rolling up the data to the remaining columns for uniqueness solved it.

Last updated