Skip to content

Latest commit

 

History

History
60 lines (51 loc) · 1.63 KB

README.md

File metadata and controls

60 lines (51 loc) · 1.63 KB

Best tweet post date prediction

Individual interdisciplinary project 2017/2018, Supervisors: Tobias Scheffer & Paul Prasse, Submission Date: 02.03.2018 @ University of Potsdam, Germany

Install

This project is working on Anaconda with :

Python 3.6.3

Run in console:

pip install -r requirements.txt

Set your Twitter API keys in config.py.

#Twitter API credentials
consumer_key  = '#####'
consumer_secret  = '#####'
access_token  = '#####'
access_secret  = '#####'

Tweet crawling

Create a folder ./data/datasets/ in root. Run in console:

python crawltwitter.py -a [twitter_user] -c1 [max_tweets] -c2 [max_accounts]

p(ex) :

python crawltwitter.py -a DataScienceCtrl -c1 10000 -c2 100

Data for each accounts retrieved will be stored in data/datasets.
The account names retrieved by this crawling will be stored in data/TwitterCrawlXXXX-XX-XXXXX.json

Gather data

Create a folder ./data/gathered/ in root. Run in console:

python gather.py -max 10000

A json file will be created in data/gathered with datas ready for training.

Train models

Create folders ./data/cache/ and ./data/publish/ in root. Run in console:

python training.py -d data\gathered\gathering_xxx_xxx.json -save -i -i -i -i

The 4 options -i corresponds of highest level of data set optimisation, you can remove them.

Feature importance will be displayed. Baselines tests too.
Thanks to -save option, model will be saved in ./data/publish/.

Running web server

Run in console:

python server.py -f data\publish\xxxxxx.model.json

Then, go to http://127.0.0.1:5000 and let play with predictions.