GDELT Data Downloader

A Python application to download GDELT datasets (Events, Mentions, and Global Knowledge Graph) and store them in a PostgreSQL database.

Features

Downloads GDELT data files in parallel
Stores data in a local PostgreSQL database
Processes large files in chunks for memory efficiency
Supports Events, Mentions, and GKG datasets
Handles duplicate records with ON CONFLICT DO NOTHING

Installation

Clone this repository:

git clone https://github.com/yourusername/gdelt-downloader.git
cd gdelt-downloader

Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate

Install dependencies:

pip install psycopg2-binary pandas requests tqdm

Ensure you have PostgreSQL running with:

Database: gdelt_raw
User: postgres
Password: postgres
Host: localhost
Port: 5432

Usage

Run the downloader:

python gdelt_downloader.py

The application will:

Create a data/ directory if it doesn't exist
Download the master file list from GDELT
Download and process all available files
Store the data in PostgreSQL database gdelt_raw

Configuration

Edit config.py to adjust:

Database connection settings
Download directory
Chunk size for processing
File types to download

Database Schema

The database contains three tables:

events: GDELT Event data
mentions: Media mentions of events
gkg: Global Knowledge Graph data

You can query the database using psycopg2:

import psycopg2
conn = psycopg2.connect(dbname='gdelt_raw', user='postgres', 
                       password='postgres', host='localhost', port=5432)
cur = conn.cursor()
cur.execute("SELECT * FROM events LIMIT 10")
events = cur.fetchall()

Notes

The initial download may take several hours depending on your internet connection
The database can grow quite large (100+ GB) for full datasets
Downloaded CSV files are kept in the data/ directory for future reference
Requires PostgreSQL 12 or higher

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Notebooks		Notebooks
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
config.py		config.py
db_schema.py		db_schema.py
gdelt_downloader.py		gdelt_downloader.py
pyproject.toml		pyproject.toml
python		python
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GDELT Data Downloader

Features

Installation

Usage

Configuration

Database Schema

Notes

About

Releases

Packages

Languages

License

Deeplearn-PeD/GDELT_csv_reader

Folders and files

Latest commit

History

Repository files navigation

GDELT Data Downloader

Features

Installation

Usage

Configuration

Database Schema

Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages