Python package for working with the National Energy Efficiency Data-Framework (NEED) dataset
The National Energy Efficiency Data-Framework (NEED) dataset is published by the UK Government here: https://www.gov.uk/government/collections/national-energy-efficiency-data-need-framework
This Python package contains a series of Python functions which can be used to:
- Download the CSV files which make up the NEED dataset.
- Import the data into a SQLite database.
- Access the data in the database, for data analysis and visualisation.
A description of the CSV files in the NEED dataset, along with instructions for downloading and importing in SQLite, has been created using the format of a CSVW metadata Table Group object (saved as a JSON file), which is available here: https://github.com/building-energy/need_functions/blob/main/need_tables-metadata.json
pip install git+https://github.com/building-energy/need_functions
The python package csvw_functions_extra
will also need to be installed.
import need_functions
# download all NEED data and import into database.
need_functions.download_and_import_all_data()
# get the distribution of property types in the data.
result=need_functions.get_distribution(
'PROP_TYPE',
'need_2021_anon_dataset_4million'
)
print(result)
[
{'PROP_TYPE': 'Bungalow', 'COUNT': 359717},
{'PROP_TYPE': 'Detached', 'COUNT': 622875},
{'PROP_TYPE': 'End terrace', 'COUNT': 358170},
{'PROP_TYPE': 'Flat', 'COUNT': 862623},
{'PROP_TYPE': 'Mid terrace', 'COUNT': 773569},
{'PROP_TYPE': 'Semi detached', 'COUNT': 1023046}
]
Description: Returns the CSV file names of all tables in the CSVW metadata file.
need_functions.get_available_csv_file_names()
Returns: A list of the https://purl.org/berg/csvw_functions_extra/vocab/csv_file_name
value in each table.
Description: Downloads all the NEED data and imports all data into a SQLite database.
need_functions.download_and_import_all_data(
csv_file_names = None,
data_folder='_data',
database_name='need_data.sqlite',
verbose=False,
)
The data to be downloaded is described in the CSVW metadata file here: https://raw.githubusercontent.com/building-energy/snecs_functions/main/snecs_tables-metadata.json
Running this function will:
- create the
data_folder
if it does not already exist. - download the CSV files to the data folder.
- download the CSVW metadata file to the data folder. This is named 'need_tables-metadata.json'.
- create a SQLite database named
database_name
in the data folder if it does not already exist. - import the CSV data into the SQLite database.
Arguments:
- csv_file_names (str, list or None): The CSV file name(s) to download and import (see
get_available_csv_file_names
).None
will download the entire dataset. - data_folder (str): The filepath of a local folder where the downloaded CSV data is saved to and the SQLite database is stored.
- database_name (str): The name of the SQLite database, relative to the data_folder.
- verbose (bool): If True, then this function prints intermediate variables and other useful information.
Returns: None
Description: Returns the table names of all NEED tables in the SQLite database.
get_need_table_names_in_database(
data_folder = '_data',
database_name = 'need_data.sqlite',
)
- data_folder (str): The filepath of a local folder where the SQLite database is stored.
- database_name (str): The name of the SQLite database, relative to the data_folder.
Returns: A list of table names.
Description: Returns lookup dictionaries for the NEED lookup codes for one or more columns.
need_functions.get_metadata_columns_codes(
column_names,
sql_table_name,
data_folder = '_data',
)
Arguments:
- column_name (str): The
name
value of a column in a CSVW TableSchema object. - sql_table_name (str): The
https://purl.org/berg/csvw_functions_extra/vocab/sql_table_name
value of the table. - data_folder (str): The filepath of a local folder where the normalized CSVW metadata file is saved.
Returns (dict of dicts): A dictionary with:
- keys: the names of the column(s)
- values: a dictionary with keys as lookup codes and values as code descriptions.
Description: Returns the distribution of values in a column.
need_functions.get_distribution(
field,
sql_table_name,
data_folder = '_data',
database_name = 'need_data.sqlite',
verbose=False
)
Arguments:
- field (str): A field in the database table.
- sql_table_name (str): The name of the database table.
- data_folder (str): The filepath of a local folder where the SQLite database is stored.
- database_name (str): The name of the SQLite database, relative to the data_folder.
Returns (list): A list of dictionaries, where each dictionary is a set of results with a unique value and a count of those unique values.