Skip to content

Latest commit

 

History

History
330 lines (239 loc) · 12.1 KB

README.md

File metadata and controls

330 lines (239 loc) · 12.1 KB

Binarcular

A library that allows you to read/write the contents of a binary file into/from a JSON Object, taking care of all data type conversion using JavaScript-friendly values. You can search for data by value, slice blocks into separate, meaningful structures or generate a binary downloadable file, all inside your browser.

Practical use cases are:

  • Validating whether a file header contains the appropriate description, by comparing its parsed data to the respective data types
  • Scanning a file for specific metadata to determine the location of other meaningful data
  • Creating a binary file in your browser, without having to repeatedly write meaningless byte values in sequence

See API and Example below.

Compatibility

binarcular should work fine on Internet Explorer 10 and up. You can verify support at runtime by querying the result of the isSupported()-method:

import { isSupported } from 'binarcular';
if ( isSupported( optRequire64bitConversion = false ) ) {
    ...do stuff!
} else {
    ...do other, less cool stuff! Actually, not cool at all!! >=/
}

NOTE: if you require support for 64-bit types there are additional requirements. Pass boolean true for optional argument optRequire64bitConversion to determine whether the environment supports 64-bit conversion.

Installation

You can get it via NPM:

npm install binarcular

Project integration

The library is compatible with CommonJS / ES6 modules or can be included in a document using AMD/RequireJS. See the contents of the /dist/ folder and include as your project sees fit.

API

The module exports the following:

import {

    isSupported:     fn( optRequire64bitConversion = false ),
    types:           Object<String>,
    parse:           async fn( dataSource, structureDefinition, optReadOffset = 0 ),
    seek:            async fn( uint8Array, searchStringOrByteArray, optReadOffset = 0 ),
    write:           async fn( uint8Array, structureDefinition, dataToWrite, optWriteOffset = 0 )
    fileToByteArray: async fn( file, optSliceOffset = 0, optSliceSize = file.size )
    byteArrayToFile: fn( uint8Array, filename, optMimeType = 'application/octet-stream' )

} from 'binarcular';

We'll look into each of these below:

Reading a chunk of data into an Object of a specific structure type

Handled by the parse method:

async parse( dataSource, structureDefinition, optReadOffset = 0 )

Where:

  • dataSource is the file to parse (can be either File, Blob, Uint8Array or (base64 encoded) String)
  • structureDefinition is an Object defining a data structure (as described here)
  • optReadOffset is a numerical index describing where in the file's ByteArray reading should start this defaults to 0 to start at the beginning of the file.

When the Promise resolves, the result is the following structure:

{
    data: Object,
    end: Number,
    error: Boolean,
    byteArray: Uint8Array
}

If all has been read successfully, data is an Object that follows the structure of wavHeader and is populated with the actual file data.

end describes at what offset in given file the structure's definition has ended. This can be used for subsequent read operations where different data types are extracted from the binary data.

If error is true, this indicates that something went wrong during parsing. The data Object will be populated with all data that could've been harvested up until the error occurred. This allow you to harvest what you can from corrupted files.

A note on using Uint8Array as dataSource

You can see that another property is defined in the result, namely byteArray. If the dataSource provided to the parse method was a Uint8Array which you intend to reuse inside your project, be sure to reassign your byteArray reference to the returned instance.

The rationale here is that for minimal overhead, the ownership of the ByteArray's binary content is transferred during the read operations. You will not be able to perform any actions on your ByteArray without updating the reference.

Looking for a specific entry in a file

If you are working with a file where the content of interest is preceded by some metadata at an arbitrary point, it makes sense to first look for this metadata declaration so you know from where you can retrieve the actual data of interest.

For this purpose you can use seek:

async seek( uint8Array, searchStringOrByteArray, optReadOffset = 0 )

where:

  • uint8Array is the ByteArray containing the binary data.
  • searchStringOrByteArray can be either a String (in case the meta data is a character sequence) or a Uint8Array holding a byte sequence that functions as the "search query".
  • optReadOffset determines the offset within the data from where to start searching, this defaults to 0 to read from the start.

The method returns a numerical index at which the data was found or Infinity if no match were found.

Writing JSON as binary content

If you have a JSON structure that you wish to write into a binary file, you can do so using write:

async write( uint8Array, structureDefinition, dataToWrite, optWriteOffset = 0 )

where:

  • uint8Array is the ByteArray containing the binary data.
  • structureDefinition is an Object defining a data structure (as described here)
  • dataToWrite is an Object following the data structure, except the value here is the data you wish to write in the binary file.
  • optWriteOffset is the index at which data will be written. This defaults to 0 to start writing at the beginning of the file. Data will be written for the length of given structureDefinition, all existing data beyond this point will remain unchanged.

The result of this operation is the following:

{
    data: Object,
    end: Number,
    error: Boolean,
    byteArray: Uint8Array
}

Where byteArray should replace the reference of the byteArray you passed into the method. This ByteArray contains the original data except that the data block starting at requested optWriteOffset has been replaced with the binary equivalent of the dataToWrite-Object. All data beyond the size of the written data block remains unchanged.

Converting a File reference to a ByteArray

async fileToByteArray( fileReference, optSliceOffset = 0, optSliceSize = fileReference.size )

where:

  • fileReference is the File (or Blob) of which the contents should be read into a Uint8Array.
  • optSliceOffset is the optional offset from where to read the data, defaults to 0 to start from the beginning.
  • optSliceSize is the optional size of the resulting ByteArray. This defaults to the size of the file to read the file in its entirety. When using a custom optSliceOffset overflow checking is performed to prevent reading out of the file boundaries.

Converting a ByteArray to a File

byteArrayToFile( byteArray, filename, optMimeType = 'application/octet-stream' )

This will generate a download of given filename, containing the data of byteArray as its content using given optMimeType.

To prevent blocking the download, this should be called directly from a click handler.

Example

Let's say we want to read the binary data of a well known proprietary format. First up we will get to...

Define a structure

Defining a structure is nothing more than declaring an Object where the keys define names meaningful to your purpose and the values consist of Strings describing:

  • one of the available type enumerations (the names of the imported types are equal to their value).
  • optional Array declaration where by adding a numerical value between brackets [n], will make the value an Array of given length n.
  • optional modifier defining the endianness of the file's byte order, separated by a pipeline (either |BE for Big Endian or |LE for Little Endian). When unspecified, the endianness of the clients system is used (assuming the file has been encoded on/by a similar system, which usually means Little Endian these days).

An example structure that defines the header of a .WAV file would look like:

const wavHeader = {
    type:           'CHAR[4]',
    size:           'INT32|LE',
    format:         'CHAR[4]',
    formatName:     'CHAR[4]',
    formatLength:   'INT32|LE',
    audioFormat:    'INT16|LE',
    channelAmount:  'INT16|LE',
    sampleRate:     'INT32|LE',
    bytesPerSecond: 'INT32|LE',
    blockAlign:     'INT16|LE',
    bitsPerSample:  'INT16|LE',
    dataChunkId:    'CHAR[4]',
    dataChunkSize:  'INT32|LE'
};

Note that the order of the keys (and more importantly: their type definition) should match the order of the values as described the particular file's type!

A teeny tiny note on Endianness

Note that specifying endianness can be omitted if you're certain that the file's encoding is equal to that of the platform you will be parsing the file on (most likely only Big Endianness will require an explicit definition). And I hope you will never be in the unfortunate situation where you work with a file that uses different endianness for different blocks!

Back to talking types

All available data types are listed in the { types } export. Note that definitions for CHAR will return as a String. If you want an 8-bit integer/byte value, use BYTE or INT8 instead.

We can now proceed to read the file:

import { parse } from 'binarcular';

async function readWaveHeader( fileReference ) {
    const { data, end, error, byteArray } = await parse( fileReference, wavHeader, 0 );

    console.log( data );  // will contain the properties of a WAV file header
    console.log( end );   // will describe the end offset of the header
    console.log( error ); // when true, a file reading error occurred
}

You can also view the demo provided in this repository's example.html file, which parses .WAV files and provides advanced examples using seeking, slicing and error correction before finally providing you with the instruction on how to extract the meaningful data from the file.

Performance

Depending on the size of the files you're working with, memory allocation can become a problem.

The parser will only read the block that is requested (e.g. starting from the requested offset and only for the size of the requested structureDefinition) and should thus be light on resources. Additionally, all read operations happen in a dedicated Web Worker which keeps your main application responsive (you can safely parse several hundred megabytes of data without blocking your UI).

Depending on your use case, it helps to take the following guidelines into consideration:

  • Use base64 only when you have no choice as a base64 String describes the file in its entirety. Also, the way JavaScript handles Strings is by allocating the entire value (and not by reference!) whenever you assign it to a new variable.
  • If you intend to do multiple reads on the same file (for instance: first reading its header to determine where in the file the meaningful content begins) it is recommended to use the fileToByteArray()-method to create a single reusable Uint8Array. This also makes sense if you need to read the file in its entirety.

Build instructions

In case you want to aid in development of the library:

The project dependencies are maintained by NPM, you can resolve them using:

npm install

You can develop (and test against the example app by navigating to http://localhost:8080) by running:

npm run dev

To create a production build:

npm run build

After which a folder dist/ is created which contains the prebuilt AMD/RequireJS and CommonJS/ES module libraries (as well as the example application).

The source code is transpiled to ES5 for maximum browser compatibility.

Unit testing

Unit tests are run via jest, you can run the tests by running:

npm run test

Unit tests go in the ./test-folder. The file name for a unit test must be equal to the file it is testing, but contain the suffix ".spec", e.g. functions.js should have a test file functions.spec.js.