Skip to content

Installation & Instantiation

alkihis edited this page Feb 19, 2020 · 9 revisions

Getting started

Install package using NPM or Yarn.

npm i twitter-archive-reader

or

yarn add twitter-archive-reader

This package internally use JSZip to read ZIP archives, you can load archives in this module the same way you load them in JSZip.

// ESModules
import TwitterArchive from 'twitter-archive-reader';

// CommonJS
const { TwitterArchive } = require('twitter-archive-reader');

Usage

You can create an instance with several types of objects, all of them must reference an archive. Once you've created the instance, you must wait for the ready-ness status of the object with the .ready() promise.

Supported loading methods:

  • File from browser
  • Buffer or ArrayBuffer
  • string (filename) for loading files in Node.js
  • number[] or Uint8Array, bytes arrays
  • JSZip instances
  • Archive instances (see StreamZip.ts/Archive class)
// You can create TwitterArchive with all supported 
// formats by JSZip's loadAsync() method 
// (see https://stuk.github.io/jszip/documentation/api_jszip/load_async.html).
// You can also use filename or node Buffer.

// By a filename
const archive = new TwitterArchive('filename.zip');

// By a file input (File object)
const archive = new TwitterArchive(document.querySelector('input[type="file"]').files[0]);

// Initialization can be long (unzipping, tweets & DMs reading...) 
// So archive supports events, you can listen for initialization steps
archive.events.on('zipready', () => {
  // ZIP is unzipped
});
archive.events.on('tweetsread', () => {
  // Tweet files has been read
});
// See all available listeners in Events section.

console.log("Reading archive...");
// You must wait for ZIP reading and archive object build
await archive.ready();

// Archive is ready !

Options for TwitterArchive

You can set options when you load the TwitterArchive instance.

Available options are:

new TwitterArchive(
  /** 
   * Archive to load.
   * Can be a string (filename), number[], Uint8Array,
   * JSZip, Archive, ArrayBuffer and File objects.
   * 
   * If you want to build an archive instance **without** a file, you can pass `null` here.
   * You must then load parts of the archive with `.loadArchivePart()` or `.loadClassicArchivePart()` !
   */
  file: AcceptedZipSources,
  options: TwitterArchiveLoadOptions = {
    /**
     * Specify if you want to ignore a specific part of archive, for performance or memory reasons.
     * 
     * Available parts are in `ArchiveReadPart` type.
     * 
     * By default, all parts are imported from archive.
     * If you want to ignore every part, you can specify `"*"` in the part array.
     * 
     * **Profile and account data is always parsed.**
     * 
     * ```ts
     * type ArchiveReadPart = "tweet" | "dm" | "follower" | "following" | "mute" | "block" | "favorite" | "list" | "moment" | "ad";
     * ```
     *
     * To manually load a part after archive has been loaded, use `.initArchivePart()` method.
     * Please don't initialize a part twice, it could lead to vicious bugs !
     */
    ignore?: (ArchiveReadPart | "*")[],
  }
)

Read an archive step by step

Since 6.0.0, you can control which part of archive is loaded during initial archive load, then decide which part to read.

Ignore parts during initialization

By default, every part of archive is fully loaded and constructed into data structures.

You can ignore specific parts with the options.ignore parameter of TwitterArchive constructor.

Available parts are defined in ArchiveReadPart type.

type ArchiveReadPart = "tweet" | "dm" | "follower" | "following" | "mute" | "block" | "favorite" | "list" | "moment" | "ad";

Note: User informations are always fully loaded and parsed if available in archive. Those kind of information is light and should not cause any problem.

// Instanciate without direct messages and ads
const archive = new TwitterArchive('filename.zip', {
  ignore: ['ad', 'dm']
});

Ignore all parts

You can initialize a instance without loading any data, except user informations. Just specify '*' in options.ignore array.

const archive = new TwitterArchive('filename.zip', {
  ignore: ['*']
});

Manually initialize a part

If you skip parts in archive initialization, you can manually parse them with the .initArchivePart() method. Each parameter of this method is a ArchiveReadPart.

Take care of not loading a part twice ! It could lead to unexpected side effects.

// Skip everything except the tweets
const archive = new TwitterArchive('filename.zip', {
  ignore: ['dm', 'ad', 'follower', 'following', 'block', 'favorite', 'list', 'moment']
});

await archive.ready();

// ...
// Later

// We want access to direct messages and ad data now
await archive.initArchivePart("dm", "ad");

// Ready to access them !

Events

Archive is quite long to read: You have to unzip, read tweets, read user informations, direct messages, and some other informations... So you might want to display current loading step to the end-user.

The TwitterArchive provides a event system compatible driven by the events package (native Node.js events). The event emitter is available at the .events property of the TwitterArchive object.

You could listen to events with .events.on() method, and remove listener(s) with .events.off().

Events are listed in their order of apparition.

Any of the described events, except error, contain elements in it (in detail attribute).

zipready

Fires when archive is unzipped (its content has not been read yet !).

userinfosready

Fires when basic user informations (archive creation date, user details) has been read.

indexready

Fires when tweet index (months, tweet number) has been read.

tweetsread

Fires when tweet files has been read.

willreaddm

Fires when direct messages files are about to be read. This event does not fire when a classic archive is given.

willreadextended

Fires when misc infos (favorites, moments...) are about to be read. This event does not fire when a classic archive is given.

read

Fires when every listen event from now that happens is fired.

archive.events.on('read', ({ step }) => {
  console.log("Archive is at read step", step);
});

ready

Fires when the reading process is over.

Linked to .ready() promise (fulfilled).

error

Fires when read fails. Contain, in the detail attribute, the throwed error.

Linked to .ready() promise (rejected).

Continue

Next part is Archive Properties.