A Python application to download GDELT datasets (Events, Mentions, and Global Knowledge Graph) and store them in a PostgreSQL database.
- Downloads GDELT data files in parallel
- Stores data in a local PostgreSQL database
- Processes large files in chunks for memory efficiency
- Supports Events, Mentions, and GKG datasets
- Handles duplicate records with ON CONFLICT DO NOTHING
- Clone this repository:
git clone https://github.com/yourusername/gdelt-downloader.git
cd gdelt-downloader
- Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate
- Install dependencies:
pip install psycopg2-binary pandas requests tqdm
- Ensure you have PostgreSQL running with:
- Database: gdelt_raw
- User: postgres
- Password: postgres
- Host: localhost
- Port: 5432
Run the downloader:
python gdelt_downloader.py
The application will:
- Create a
data/
directory if it doesn't exist - Download the master file list from GDELT
- Download and process all available files
- Store the data in PostgreSQL database
gdelt_raw
Edit config.py
to adjust:
- Database connection settings
- Download directory
- Chunk size for processing
- File types to download
The database contains three tables:
events
: GDELT Event datamentions
: Media mentions of eventsgkg
: Global Knowledge Graph data
You can query the database using psycopg2:
import psycopg2
conn = psycopg2.connect(dbname='gdelt_raw', user='postgres',
password='postgres', host='localhost', port=5432)
cur = conn.cursor()
cur.execute("SELECT * FROM events LIMIT 10")
events = cur.fetchall()
- The initial download may take several hours depending on your internet connection
- The database can grow quite large (100+ GB) for full datasets
- Downloaded CSV files are kept in the
data/
directory for future reference - Requires PostgreSQL 12 or higher