-
Notifications
You must be signed in to change notification settings - Fork 7
Scrunch Reference
Log in to Crunch with an API key; return the top-level Site payload. Using this stores a reference to the session created in pycrunch.session for future use.
Log in to Crunch with an API key; return the top-level Site payload. Your API key will only work with your organization's subdomain.
Imported from scrunch.datasets.get_dataset
for ease of access. dataset
can be either a dataset name, or ID.
If the site
parameter isn't provided the library will try to use automatically the following authentication methods:
- An existing session, if the user has done a previous
scrunch.connect
in the current execution environment. -
CRUNCH_API_KEY
environment variable:
export CRUNCH_API_KEY=apikeysecret;
-
CRUNCH_USERNAME
andCRUNCH_PASSWORD
environment variables (soon to be deprecated):
export [email protected]; export CRUNCH_PASSWORD=yourpassword;
- A
crunch.ini
file withCRUNCH_API_KEY
keyword wrapped in a section named DEFAULT:
[DEFAULT] CRUNCH_API_KEY=apikeysecret
- A
crunch.ini
file withCRUNCH_USERNAME
andCRUNCH_PASSWORD
keywords wrapped in a section named DEFAULT (Soon to be deprecated):
[DEFAULT] [email protected] CRUNCH_PASSWORD=yourpassword
Also passing editor=True
to it will automatically change the editor to the logged in user.
Returns a Dataset Entity record if the dataset exists. Raises a KeyError if a 404 is returned from Crunch API.
Maps subvariable aliases to urls
- :param ds: a dataset object
- :param variable_url: url of the variable we want to inspect
- :param response_map: mapping of new subvariables
- :return:
- :param project: name or ID of the project
- :param site: scrunch session, defaults to global session
- :return: the project session
- :param map: categories keyed by new category id mapped to existing ones
- :return: a list of dictionary objects that the Crunch API expects
- :param map: responses keyed by new alias mapped to existing aliases
- :return: a list of dictionaries describing the new responses to create for the variable
- :param ds: The dataset to look for the id of variable
- :param id: The id string of a variable
- :return: the url of the given variable as crunch url
- :param ds: The dataset we are gonna inspect
- :param alias: the alias of the variable name we want to check
- :return: the id of the given varname or None
Receive a valid variable reference and return the variable url.
- :param ds: The crunch dataset
- :param variable: A valid variable reference in the form of a shoji Entity of the variable or a string containing the variable url or alias.
- :return: The variable url
scrunch.datasets.AbstractContainer
builtins.object
scrunch.datasets.Hierarchy
scrunch.datasets.VariableList
scrunch.datasets.Group
A pycrunch.shoji.Entity wrapper that provides dataset-specific methods.
scrunch.datasets.Dataset
builtins.object
- :param resource: Points to a pycrunch Shoji Entity for a dataset.
Change the current editor of the Crunch dataset.
-
:param user: The email address or the crunch url of the user who should be set as the new current editor of the given dataset.
-
:returns: None
Create a new variable in the given dataset that is a recode of an existing variable
category_map = { 1: { "name": "Favorable", "missing": True, "combined_ids": [1,2] }, }
- :param variable: alias of the variable to recode
- :param name: name for the new variable
- :param alias: alias for the new variable
- :param description: description for the new variable
- :param category_map: map to combine categories
- :return: the new created variable
Creates a new variable in the given dataset that combines existing responses into new categorized ones
response_map = {
new_subvar_name1:[old_subvar_alias1, old_subvar_alias2],
new_subvar_name2: [old_subvar_alias3, old_subvar_alias4]
}
- :return: newly created variable
scrunch.datasets.create_categorical
(self, categories, rules, name, alias, description='', missing=True)
creates a categorical variable deriving from other variables
Creates a Multiple response (array) using a set of rules for each of the responses(subvariables).
Creates a savepoint on the dataset.
-
:param description: The description that should be given to the new savepoint. This function will not let you create a new savepoint with the same description as any other savepoint.
-
:returns: None
Deletes all the forks on the dataset. CANNOT BE UNDONE!
Downloads a dataset as CSV to the given path. this includes hidden variables and categories as id's.
Given a dataset object, apply an exclusion filter to it (defined as an expression string).
If the expr
parameter is None, an empty expression object is sent
as part of the PATCH request, which effectively removes the exclusion
filter (if any).
scrunch.datasets.fork
(self, description=None, name=None, is_published=False, preserve_owner=True, **kwargs)
Create a fork of ds and add virgin savepoint.
-
:param description: str, default=None If given, the description to be applied to the fork. If not given the description will be copied from ds.
-
:param name: str, default=None If given, the name to be applied to the fork. If not given a default name will be created which numbers the fork based on how many other forks there are on ds.
-
:param is_published: bool, default=False If True, the fork will be visible to viewers of ds. If False it will only be viewable to editors of ds.
-
:param preserve_owner: bool, default=True If True, the owner of the fork will be the same as the parent dataset. If the owner of the parent dataset is a Crunch project, then it will be preserved regardless of this parameter.
-
:param kwargs: Additional keyword arguments accepted by
forks
API endpoint. You must provideproject
ifpreserve_owner
is False. Starting with version 0.18.5project
should be set to the name of the desired project. If setting to a sub project then you can provide the path separated by|
. For example:parent|child
-
:returns _fork: scrunch.datasets.Dataset The forked dataset.
Return a dataframe summarizing the forks on the dataset.
- :returns _forks : pandas.DataFrame A DataFrame representation of all attributes from all forks on the given dataset.
Joins a given variable. In crunch joins are left joins, where left is the dataset variable and right is other dataset variable. For more information see: http://docs.crunch.io/?http#merging-and-joining-datasets
-
:param: columns: Specify a list of variables from right dataset to bring in the merge: http://docs.crunch.io/?http#joining-a-subset-of-variables
-
:param: wait: Wait for the join progress to finish by polling or simply return a url to the progress resource
-
:param: filter: Filters out rows based on the given expression, or on a given url for an existing filter. TODO: for the moment we only allow expressions
Load a savepoint on the dataset.
-
:param description: default=None The description that identifies which savepoint to be loaded. When loading a savepoint, all savepoints that were saved after the loaded savepoint will be destroyed permanently.
-
:returns: None
Batches in the rows that have been currently streamed.
Return list of attributes from the given dataset's savepoints.
- :param attrib: The attribute to be returned for each savepoint in the given dataset. Available attributes are: 'creation_time' 'description' 'last_update' 'revert' 'user_name' 'version'
Receives a dict with columns of values to add and streams them into the dataset. Client must call .push_rows(n) later.
Returns the total of rows streamed
scrunch.datasets.Group
scrunch.datasets.AbstractContainer
builtins.object
scrunch.datasets.Hierarchy
scrunch.datasets.AbstractContainer
builtins.object
scrunch.datasets.Order
builtins.object
scrunch.datasets.OrderUpdateError
builtins.Exception
builtins.BaseException
builtins.object
A pycrunch.shoji.Entity wrapper that provides variable-specific methods.
scrunch.datasets.Variable
builtins.object
scrunch.datasets.recode
(self, alias=None, map=None, names=None, default='missing', name=None, description=None)
Implements SPSS-like recode functionality for Crunch variables.
scrunch.datasets.VariableList
scrunch.datasets.AbstractContainer
builtins.object