Note
This is meant to be a high-level survey of existing Jupyter documentation with a focus on security-related instructions. This survey was originally compiled in August 2021 by Kay Avila with input and review from Terry Fleury and Jeannette Dopheide as part of a Trusted CI engagement .
- Jupyter-owned domains with documentation:
- Styled to look like jupyter.org -
- Not re-styled
- GitHub repos with documentation
- https://github.com/jupyterhub/jupyterhub
- https://github.com/jupyterhub/jupyterhub-tutorial
- https://github.com/jupyterhub/jupyterhub-deploy-docker - JupyterHub in a single Docker container
- https://github.com/jupyter/notebook
- Encourages people to move to JupyterLab for more support
- "Our approach moving forward will be:
- To maintain the security of the Jupyter Notebook. That means security-related issues and pull requests are our highest priority.
- To address JupyterLab feature parity issues. As part of this effort, we are also working on a better notebook-only experience in JupyterLab for users who prefer the UI of the classic Jupyter Notebook.
- ... We cannot support or maintain new features at this time, but we welcome security and other sustainability fixes."
- https://jupyter.readthedocs.io/en/latest/use/use-cases/content-user.html
- Lists notebook narratives; Jupyter for data science, scientific computing, education, and enterprise - but these are mostly just placeholder stubs
- https://jupyter.readthedocs.io/en/latest/install/notebook-classic.html
- Installation instructions (for Jupyter Notebook)
- Guide recommends Anaconda, or failing that, pip3 (pip)
- Note: Doesn't mention whether to install with sudo or not
- Without sudo, places the executables in /home/<user>/.local/bin
- Running jupyter notebook command on CLI
- By default, runs on localhost:8888
- Note: Can change it with --ip= and --port= args
- Provides a URL with a token
- Note: Accessing a URL with an invalid token prompts for a password or token, and also allows for setting a new password if provided a token
- By default, runs on localhost:8888
- Installation instructions (for Jupyter Notebook)
- Weird issue - https://jupyter.readthedocs.io/en/latest/install.html left navbar repeats "Narratives and Use Cases" and "Advanced Use Cases"
- https://jupyter-notebook.readthedocs.io/en/latest/notebook.html#introduction
- Browser compatibility: "Using Safari with HTTPS and an untrusted
certificate is known to not work (websockets will fail)."
- Doesn't explain what the issue is (self-signed cert)
- Browser compatibility: "Using Safari with HTTPS and an untrusted
certificate is known to not work (websockets will fail)."
- https://jupyter-notebook.readthedocs.io/en/latest/notebook.html#notebooks-and-privacy
- If you followed standard install (linked above), then just running on own computer
- Can also run it remotely: "You can also use Jupyter remotely: your company or university might run the server for you, for instance. If you want to work with sensitive data in those cases, talk to your IT or data protection staff about it."
- https://jupyter-notebook.readthedocs.io/en/latest/notebook.html#trusting-notebooks
- Signatures are stored of trusted notebooks (those fully executed by the user), and display HTML and Javascript output
- "jupyter trust <notebook>.ipynb" to trust one
- See Security section for more info
- https://jupyter-notebook.readthedocs.io/en/latest/examples/Notebook/examples_index.html
- Note: No actual notebooks on this page, despite the text
- Links to nbviewer example notebooks
- https://jupyter-notebook.readthedocs.io/en/latest/security.html#security-in-notebook-documents
- Problem of arbitrary code execution
- Security model -
- Untrusted HTML is always sanitized
- Untrusted Javascript is never executed
- HTML and Javascript in Markdown cells are never trusted
- Outputs generated by the user are trusted
- Any other HTML or Javascript (in Markdown cells, output generated by others) is never trusted
- The central question of trust is "Did the current user do this?"
- Checks signature when notebook is run to see which parts were created by current user
- Trusted updated when the notebook is saved
- Notebooks can be explicitly trusted with a CLI command or in the web interface
- Information on vulnerability reporting - report to [email protected] and can use PGP public key to encrypt
- Changes in Jupyter 2.0:
- Javascript and CSS are sanitized and stripped out
- Cannot see collaborator's outputs in a shared notebook because
they are untrusted
- Can rerun notebooks, explicitly trust, or share a notebook signature database
- https://jupyter-notebook.readthedocs.io/en/latest/security.html#server-security
(very short section)
- Token-based auth on by default, or can set a password
- https://jupyter-notebook.readthedocs.io/en/stable/public_server.html#securing-a-notebook-server
(Running a Notebook Server)
- Warning about not meant to be multi-user
- Setting password on the notebook server - automatically prompted in notebooks 5.3+
- Using SSL for encrypted communication
- Using Let's Encrypt
- https://jupyterlab.readthedocs.io/en/latest/getting_started/starting.html
- Says it runs on top of Jupyter Server, so see the Jupyter Server security section
- https://jupyter-server.readthedocs.io/
- Separated into Users, Operators, Developers, Contributors, Other
- Users -
https://jupyter-server.readthedocs.io/en/latest/users/index.html
- Nothing specifically about security
- Operators -
https://jupyter-server.readthedocs.io/en/latest/operators/index.html
- Installing a Jupyter extension automatically enables it [not ideal from a security standpoint]
- Running a public Jupyter Server (intended only for single user)
- Uses ZeroMQ
- Can use a simple password with an automatic setup in the user interface or running "jupyter server password", or by manually creating a hashed password and adding it to the configuration file
- Recommends using SSL
- Brief description of self-designed versus LetsEncrypt
- Links to ArsTechnica article about obtaining paid certificate
- Also links to LetsEncrypt further down in the page, under Running a public notebook server [this is confusing!]
- Later on the same page, more information about how to use SSL certs and info on how to use LetsEncrypt
- Firewall setup - allow public connections and localhost connections
- Overriding Content-Security-Policy to allow embedding into another web page
- Can specify an external gateway server to do kernel management
- Mozilla and others recommend enabling Content Security Policy
headers to provide cross-site scripting
- Disables inline JavaScript - which causes issues for Jupyter
- Restricts communication to https, which disables ws/wss, which Jupyter uses for interacting with kernels
- Need to add the following to the CSP headers -
- 'unsafe-inline' and connect-src https: wss:
- Note: not much about how this leaves Jupyter vulnerable, and nothing about how cross-site scripting protections can be enabled in another way
- Security in the Jupyter Server
- Token-based auth - on by default
- Can be provided to the server in an authorization header, URL parameter, or password field of login form
- If Jupyter server will launch the browser, an additional token is generated and then used to set a cookie
- Can set a password instead (jupyter server password)
- Possible to disable authentication, but not recommended
- Security in notebook documents
- [Duplicated information from Jupyter Notebook - arbitrary code execution, trust model, etc.]
- Token-based auth - on by default
- Developers -
https://jupyter-server.readthedocs.io/en/latest/developers/index.html
- Depending on Jupyter Server [does not mention how to watch for security issues]
- Note: nothing about how to contribute info about security issues here
- Contributors -
https://jupyter-server.readthedocs.io/en/latest/contributors/index.html
- General Jupyter contributor guidelines -
- "jupyter_server has adopted automatic code formatting so you shouldn't need to worry too much about your code style"
- Links to https://jupyter.readthedocs.io/en/latest/contributing/content-contributor.html
- General Jupyter contributor guidelines -
- Other -
https://jupyter-server.readthedocs.io/en/latest/other/index.html
- FAQ is very short - just one ("Can I configure multiple extensions at once?")
- Config file and command line options - a few mentions of impact on security from various settings
- Changelog is buried here (?)
- https://jupyterhub.readthedocs.io/en/latest/getting-started/security-basics.html
- Note: the list at the top of subjects covered is different from the order they're actually covered in
- Enable SSL (note at top about not running w/out SSL on public
network)
- Adding SSL key and cert to JupyterHub
- Using LetsEncrypt
- Mention of SSL termination happening outside of the Hub, e.g. SSL termination provided by Nginx
- Proxy authentication token
- Manual secret token between Hub and Proxy
- Options: set in config file, or use environmental variable
- If not set manually, will be negotiated between Hub and Proxy (and Proxy must be restarted anytime the Hub is restarted)
- "Cookie secret" encryption key to encrypt browser cookies used for
auth
- Options: set file location in config file, environmental variable, or store in the config file
- List of cookies used
- https://jupyterhub.readthedocs.io/en/stable/reference/websecurity.html
- Designed by default for semi-trusted users, takes extra work to
secure for untrusted users
- Note: Confusing/unclear sentence - "If the Hub is serving
untrusted users, many of the web's cross-site protections are
not applied between single-user servers and the Hub, or between
single-user servers and each other, since browsers see the
whole thing (proxy, Hub, and single user servers) as a single
website (i.e. single domain)."
- Makes it sound like protections are not applied for untrusted users, as opposed to making it clear admins need to be aware of this
- Note: Confusing/unclear sentence - "If the Hub is serving
untrusted users, many of the web's cross-site protections are
not applied between single-user servers and the Hub, or between
single-user servers and each other, since browsers see the
whole thing (proxy, Hub, and single user servers) as a single
website (i.e. single domain)."
- Protecting users from each other
- Admins must ensure users cannot modify their single-user notebook servers or the configuration of their notebook server
- Mitigation options
- Run single-user servers on subdomains (requires wildcard ssl
cert)
- Highly encouraged because resolves cross-site issues
- Disable user-owned config files from being loaded
- Note: Typo - "After implementing this option, PATHs and package installation and PATHs are the other things that the admin must enforce."
- Prevent spawner from evaluating shell config files
- Run single-user servers in virtualenvs with disabled
system-site-packages, and do not let user install packages
- This impacts only the server, not the environment(s) where their kernel(s) run
- Encryption
- Communication among proxy, hub, and single-user notebooks is unencrypted by default
- Use IPC instead of ZeroMQ since the latter is unencrypted
- Mentions that "internal_ssl option will eventually extend to securing the tcp sockets as well."
- Use security audits
- Run single-user servers on subdomains (requires wildcard ssl
cert)
- Information on vulnerability reporting - report to [email protected] and can use PGP public key to encrypt
- Designed by default for semi-trusted users, takes extra work to
secure for untrusted users
- https://jupyterhub.readthedocs.io/en/latest/getting-started/institutional-faq.html#for-it
- Section - "How would I set up JupyterHub on institutional
hardware?"
- Zero to JupyterHub for Kubernetes
- Littlest JupyterHub (runs in a VM)
- Section - "Is JupyterHub secure?"
- Links to page Security Overview that I hadn't found before and JupyterHub on Kubernetes Security
- Mentions reaching out to community in the forum
- Section - "Can JupyterHub be used with my high-performance
computing resources?"
- Yes - e.g. Dask
- Section - "How much resources do user sessions take?"
- Note: says it's configurable, but doesn't link to documentation on how to do this
- Section - "How would I set up JupyterHub on institutional
hardware?"
- https://jupyterhub.readthedocs.io/en/latest/getting-started/authenticators-users-basics.html
- Authentication and User Basics
- Admin accounts and whether they have access to user notebooks
- https://jupyterhub.readthedocs.io/en/stable/reference/spawners.html -
under the Encryption section
- Encryption among Proxy, Hub, and Notebook
- https://jupyterhub.readthedocs.io/en/stable/reference/config-sudo.html
- Running the Hub process without root privileges
- https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/security.html
- Advice is mostly for cloud-based deployments
- Information on vulnerability reporting - report to [email protected] and can use PGP public key to encrypt
- HTTPS
- Add LetsEncrypt to proxy by editing config.yaml file
- Recommends using static IP address as a load balancer IP if LoadBalancer proxy being used
- Or, manual https certificate ("considered an advanced option")
- By configuring in config.yaml file or
- Use kubectl to add a secret resource
- Off-load SSL to a load balancer
- Add LetsEncrypt to proxy by editing config.yaml file
- Secure access to helm - see the relevant Kubernetes docs
- Delete Kubernetes dashboard
- Keep RBAC enabled, otherwise all pods are given root equivalent
permissions
- However, though strongly discouraged, also gives instructions to disabling RBAC
- Instructions on how to give users access to the Kubernetes API
- Recommends also setting up RBAC (no example given, links to Kubernetes RBAC docs)
- Block access to metadata about cloud from the provider
- With a NetworkPolicy enforced by NetworkPolicy controller
- Typo: We recommend relying on this approach if you had a NetworkPolicy controller
- Default configuration uses singleuser.cloudMetadata.blockWithIptables
- With a NetworkPolicy enforced by NetworkPolicy controller
- Kubernetes Network Policies
- Note that any unsupported options will be silently ignored
- Enabled by default in JupyterHub helm charts in version 0.10+
- Network policies by default do not allow user pods to talk to
JupyterHub component pods
- Gives instructions on how to add additional access
- Default policy allows all egress traffic
- Gives information and example on how to override this with more restrictive controls
- Restricting load balancer access
- By default, any IP is allowed to access the load balancer