Skip to content

Scrunch Reference

Raul Chacon edited this page Nov 13, 2024 · 13 revisions

scrunch

Functions

scrunch.connect(api_key=None, site_url="https://.crunch.io/api/")

Log in to Crunch with an API key; return the top-level Site payload. Using this stores a reference to the session created in pycrunch.session for future use.

Log in to Crunch with an API key; return the top-level Site payload. Your API key will only work with your organization's subdomain.

scrunch.get_dataset(dataset, site=None, editor=False)

Imported from scrunch.datasets.get_dataset for ease of access. dataset can be either a dataset name, or ID.

If the site parameter isn't provided the library will try to use automatically the following authentication methods:

  • An existing session, if the user has done a previous scrunch.connect in the current execution environment.
  • CRUNCH_API_KEY environment variable:
export CRUNCH_API_KEY=apikeysecret;
  • CRUNCH_USERNAME and CRUNCH_PASSWORD environment variables (soon to be deprecated):
export [email protected];
export CRUNCH_PASSWORD=yourpassword;
  • A crunch.ini file with CRUNCH_API_KEY keyword wrapped in a section named DEFAULT:
[DEFAULT]
CRUNCH_API_KEY=apikeysecret
  • A crunch.ini file with CRUNCH_USERNAME and CRUNCH_PASSWORD keywords wrapped in a section named DEFAULT (Soon to be deprecated):
[DEFAULT]
[email protected]
CRUNCH_PASSWORD=yourpassword

Also passing editor=True to it will automatically change the editor to the logged in user.

Returns a Dataset Entity record if the dataset exists. Raises a KeyError if a 404 is returned from Crunch API.

scrunch.datasets

Functions

scrunch.datasets.aliases_to_urls(ds, variable_url, response_map)

Maps subvariable aliases to urls

  • :param ds: a dataset object
  • :param variable_url: url of the variable we want to inspect
  • :param response_map: mapping of new subvariables
  • :return:
scrunch.datasets.change_project(project, site=None)
  • :param project: name or ID of the project
  • :param site: scrunch session, defaults to global session
  • :return: the project session
scrunch.datasets.create_dataset(name, variables, site=None)
scrunch.datasets.download_file(url, filename)
scrunch.datasets.validate_category_map(map)
  • :param map: categories keyed by new category id mapped to existing ones
  • :return: a list of dictionary objects that the Crunch API expects
scrunch.datasets.validate_category_rules(categories, rules)
scrunch.datasets.validate_response_map(map)
  • :param map: responses keyed by new alias mapped to existing aliases
  • :return: a list of dictionaries describing the new responses to create for the variable
scrunch.datasets.var_id_to_url(ds, id)
  • :param ds: The dataset to look for the id of variable
  • :param id: The id string of a variable
  • :return: the url of the given variable as crunch url
scrunch.datasets.var_name_to_url(ds, alias)
  • :param ds: The dataset we are gonna inspect
  • :param alias: the alias of the variable name we want to check
  • :return: the id of the given varname or None
scrunch.datasets.variable_to_url(ds, variable)

Receive a valid variable reference and return the variable url.

  • :param ds: The crunch dataset
  • :param variable: A valid variable reference in the form of a shoji Entity of the variable or a string containing the variable url or alias.
  • :return: The variable url

Classes

scrunch.datasets.AbstractContainer

Ancestors (in MRO)

scrunch.datasets.AbstractContainer

builtins.object

Descendents

scrunch.datasets.Hierarchy

scrunch.datasets.VariableList

scrunch.datasets.Group

Class variables

scrunch.datasets.indent_size

scrunch.datasets.Dataset

A pycrunch.shoji.Entity wrapper that provides dataset-specific methods.

Ancestors (in MRO)

scrunch.datasets.Dataset

builtins.object

Class variables

scrunch.datasets.ENTITY_ATTRIBUTES

Static methods

scrunch.datasets.__init__(self, resource)
  • :param resource: Points to a pycrunch Shoji Entity for a dataset.
scrunch.datasets.change_editor(self, user)

Change the current editor of the Crunch dataset.

Parameters

  • :param user: The email address or the crunch url of the user who should be set as the new current editor of the given dataset.

  • :returns: None

scrunch.datasets.combine_categories(self, variable, category_map, name, alias, description='')

Create a new variable in the given dataset that is a recode of an existing variable

category_map = { 1: { "name": "Favorable", "missing": True, "combined_ids": [1,2] }, }

  • :param variable: alias of the variable to recode
  • :param name: name for the new variable
  • :param alias: alias for the new variable
  • :param description: description for the new variable
  • :param category_map: map to combine categories
  • :return: the new created variable
scrunch.datasets.combine_responses(self, variable, response_map, name, alias, description='')

Creates a new variable in the given dataset that combines existing responses into new categorized ones

response_map = {
    new_subvar_name1:[old_subvar_alias1, old_subvar_alias2],
    new_subvar_name2: [old_subvar_alias3, old_subvar_alias4]
}
  • :return: newly created variable
scrunch.datasets.copy_variable(self, variable, name, alias)
scrunch.datasets.create_categorical(self, categories, rules, name, alias, description='', missing=True)

creates a categorical variable deriving from other variables

scrunch.datasets.create_multiple_response(self, responses, rules, name, alias, description='')

Creates a Multiple response (array) using a set of rules for each of the responses(subvariables).

scrunch.datasets.create_savepoint(self, description)

Creates a savepoint on the dataset.

  • :param description: The description that should be given to the new savepoint. This function will not let you create a new savepoint with the same description as any other savepoint.

  • :returns: None

scrunch.datasets.delete_forks(self)

Deletes all the forks on the dataset. CANNOT BE UNDONE!

scrunch.datasets.download(self, path, filter=None, variables=None, hidden=True)

Downloads a dataset as CSV to the given path. this includes hidden variables and categories as id's.

scrunch.datasets.exclude(self, expr=None)

Given a dataset object, apply an exclusion filter to it (defined as an expression string).

If the expr parameter is None, an empty expression object is sent as part of the PATCH request, which effectively removes the exclusion filter (if any).

scrunch.datasets.fork(self, description=None, name=None, is_published=False, preserve_owner=True, **kwargs)

Create a fork of ds and add virgin savepoint.

  • :param description: str, default=None If given, the description to be applied to the fork. If not given the description will be copied from ds.

  • :param name: str, default=None If given, the name to be applied to the fork. If not given a default name will be created which numbers the fork based on how many other forks there are on ds.

  • :param is_published: bool, default=False If True, the fork will be visible to viewers of ds. If False it will only be viewable to editors of ds.

  • :param preserve_owner: bool, default=True If True, the owner of the fork will be the same as the parent dataset. If the owner of the parent dataset is a Crunch project, then it will be preserved regardless of this parameter.

  • :param kwargs: Additional keyword arguments accepted by forks API endpoint. You must provide project if preserve_owner is False. Starting with version 0.18.5 project should be set to the name of the desired project. If setting to a sub project then you can provide the path separated by |. For example: parent|child

  • :returns _fork: scrunch.datasets.Dataset The forked dataset.

scrunch.datasets.forks_dataframe(self)

Return a dataframe summarizing the forks on the dataset.

  • :returns _forks : pandas.DataFrame A DataFrame representation of all attributes from all forks on the given dataset.
scrunch.datasets.join(self, left_var, right_ds, right_var, columns=None, filter=None, wait=True)

Joins a given variable. In crunch joins are left joins, where left is the dataset variable and right is other dataset variable. For more information see: http://docs.crunch.io/?http#merging-and-joining-datasets

  • :param: columns: Specify a list of variables from right dataset to bring in the merge: http://docs.crunch.io/?http#joining-a-subset-of-variables

  • :param: wait: Wait for the join progress to finish by polling or simply return a url to the progress resource

  • :param: filter: Filters out rows based on the given expression, or on a given url for an existing filter. TODO: for the moment we only allow expressions

scrunch.datasets.load_savepoint(self, description=None)

Load a savepoint on the dataset.

  • :param description: default=None The description that identifies which savepoint to be loaded. When loading a savepoint, all savepoints that were saved after the loaded savepoint will be destroyed permanently.

  • :returns: None

scrunch.datasets.push_rows(self, count)

Batches in the rows that have been currently streamed.

scrunch.datasets.rename(self, new_name)
scrunch.datasets.savepoint_attributes(self, attrib)

Return list of attributes from the given dataset's savepoints.

  • :param attrib: The attribute to be returned for each savepoint in the given dataset. Available attributes are: 'creation_time' 'description' 'last_update' 'revert' 'user_name' 'version'
scrunch.datasets.stream_rows(self, columns)

Receives a dict with columns of values to add and streams them into the dataset. Client must call .push_rows(n) later.

Returns the total of rows streamed

Instance variables

scrunch.datasets.order
scrunch.datasets.resource
scrunch.datasets.session

scrunch.datasets.Group

Ancestors (in MRO)

scrunch.datasets.Group

scrunch.datasets.AbstractContainer

builtins.object

Class variables

scrunch.datasets.indent_size

Static methods

scrunch.datasets.__init__(self, obj, order, parent=None)
scrunch.datasets.create(self, name, elements=None)
scrunch.datasets.delete(self)
scrunch.datasets.find(self, name)
scrunch.datasets.find_group(self, name)
scrunch.datasets.move(self, elements, position=-1)
scrunch.datasets.move_after(self, reference, elements)
scrunch.datasets.move_before(self, reference, elements)
scrunch.datasets.move_bottom(self, element)
scrunch.datasets.move_down(self, element)
scrunch.datasets.move_top(self, element)
scrunch.datasets.move_up(self, element)
scrunch.datasets.remove(self, elements)
scrunch.datasets.rename(self, name)
scrunch.datasets.set(self, elements)

Instance variables

scrunch.datasets.elements
scrunch.datasets.hierarchy
scrunch.datasets.name
scrunch.datasets.order
scrunch.datasets.parent
scrunch.datasets.variables

scrunch.datasets.Hierarchy

Ancestors (in MRO)

scrunch.datasets.Hierarchy

scrunch.datasets.AbstractContainer

builtins.object

Class variables

scrunch.datasets.indent_size

Static methods

scrunch.datasets.__init__(self, group)

Instance variables

scrunch.datasets.elements
scrunch.datasets.group
scrunch.datasets.order

scrunch.datasets.Order

Ancestors (in MRO)

scrunch.datasets.Order

builtins.object

Static methods

scrunch.datasets.__init__(self, ds)
scrunch.datasets.create(self, *args, **kwargs)
scrunch.datasets.delete(self, *args, **kwargs)
scrunch.datasets.find(self, *args, **kwargs)
scrunch.datasets.find_group(self, *args, **kwargs)
scrunch.datasets.move(self, *args, **kwargs)
scrunch.datasets.move_after(self, *args, **kwargs)
scrunch.datasets.move_before(self, *args, **kwargs)
scrunch.datasets.move_bottom(self, *args, **kwargs)
scrunch.datasets.move_down(self, *args, **kwargs)
scrunch.datasets.move_top(self, *args, **kwargs)
scrunch.datasets.move_up(self, *args, **kwargs)
scrunch.datasets.remove(self, *args, **kwargs)
scrunch.datasets.rename(self, *args, **kwargs)
scrunch.datasets.set(self, *args, **kwargs)
scrunch.datasets.update(self)

Instance variables

scrunch.datasets.ds
scrunch.datasets.graph
scrunch.datasets.hier
scrunch.datasets.hierarchy
scrunch.datasets.variables
scrunch.datasets.vars

scrunch.datasets.OrderUpdateError

Ancestors (in MRO)

scrunch.datasets.OrderUpdateError

builtins.Exception

builtins.BaseException

builtins.object

Class variables

scrunch.datasets.args

scrunch.datasets.Variable

A pycrunch.shoji.Entity wrapper that provides variable-specific methods.

Ancestors (in MRO)

scrunch.datasets.Variable

builtins.object

Class variables

scrunch.datasets.ENTITY_ATTRIBUTES

Static methods

scrunch.datasets.__init__(self, resource)
scrunch.datasets.edit(self, **kwargs)
scrunch.datasets.edit_categorical(self, categories, rules)
scrunch.datasets.edit_derived(self, variable, mapper)
scrunch.datasets.hide(self)
scrunch.datasets.recode(self, alias=None, map=None, names=None, default='missing', name=None, description=None)

Implements SPSS-like recode functionality for Crunch variables.

scrunch.datasets.unhide(self)

Instance variables

scrunch.datasets.resource

scrunch.datasets.VariableList

Ancestors (in MRO)

scrunch.datasets.VariableList

scrunch.datasets.AbstractContainer

builtins.object

Class variables

scrunch.datasets.indent_size

Static methods

scrunch.datasets.__init__(self, group)

Instance variables

scrunch.datasets.elements
scrunch.datasets.group
scrunch.datasets.order
Clone this wiki locally