Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(relay): add view hierarchy scrubbing #4452

Open
wants to merge 20 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## 25.1.0

**Features**:

- Add support for view hierarchy attachment scrubbing. ([#4452](https://github.com/getsentry/relay/pull/4452))

**Internal**:

- Updates performance score calculation on spans and events to also store cdf values as measurements. ([#4438](https://github.com/getsentry/relay/pull/4438))
Expand Down
2 changes: 2 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions relay-dynamic-config/src/feature.rs
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,9 @@ pub enum Feature {
#[doc(hidden)]
#[serde(rename = "organizations:performance-queries-mongodb-extraction")]
ScrubMongoDbDescriptions,
#[doc(hidden)]
#[serde(rename = "organizations:view-hierarchy-scrubbing")]
ViewHierarchyScrubbing,
/// Forward compatibility.
#[doc(hidden)]
#[serde(other)]
Expand Down
2 changes: 2 additions & 0 deletions relay-pii/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ relay-event-schema = { workspace = true }
relay-log = { workspace = true }
relay-protocol = { workspace = true }
serde = { workspace = true }
serde_json = { workspace = true }
serde-transcode = { workspace = true }
sha1 = { workspace = true }
smallvec = { workspace = true }
thiserror = { workspace = true }
Expand Down
29 changes: 25 additions & 4 deletions relay-pii/src/attachments.rs
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
use std::borrow::Cow;
use std::iter::FusedIterator;

use regex::bytes::RegexBuilder as BytesRegexBuilder;
use regex::{Match, Regex};
use relay_event_schema::processor::{FieldAttrs, Pii, ProcessingState, ValueType};
use serde_json::Deserializer;
use smallvec::SmallVec;
use std::borrow::Cow;
use std::iter::FusedIterator;
use utf16string::{LittleEndian, WStr};

use crate::compiledconfig::RuleRef;
use crate::regexes::{get_regex_for_rule_type, ReplaceBehavior};
use crate::{utils, CompiledPiiConfig, Redaction};
use crate::{utils, CompiledPiiConfig, JsonScrubVisitor, Redaction, ScrubViewHierarchyError};

/// The minimum length a string needs to be in a binary blob.
///
Expand Down Expand Up @@ -513,6 +513,27 @@ impl<'a> PiiAttachmentsProcessor<'a> {
false
}
}

/// Applies PII rules to the given JSON.
///
/// This function will perform PII scrubbing using `serde_transcode`, which means that it
/// does not have to lead the entire document in memory but will rather perform in on a
Litarnus marked this conversation as resolved.
Show resolved Hide resolved
/// per-item basis using a streaming approach.
///
/// Returns a scrubbed copy of the JSON document.
pub fn scrub_json(&self, payload: &[u8]) -> Result<Vec<u8>, ScrubViewHierarchyError> {
let output = Vec::new();

let visitor = JsonScrubVisitor::new(self.compiled_config);

let mut deserializer_inner = Deserializer::from_slice(payload);
let deserializer = crate::transform::Deserializer::new(&mut deserializer_inner, visitor);
Comment on lines +529 to +530
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, to make it super explicit which deserializer is which:

Suggested change
let mut deserializer_inner = Deserializer::from_slice(payload);
let deserializer = crate::transform::Deserializer::new(&mut deserializer_inner, visitor);
let mut deserializer_inner = serde_json::Deserializer::from_slice(payload);
let deserializer = transform::Deserializer::new(&mut deserializer_inner, visitor);


let mut serializer = serde_json::Serializer::new(output);
serde_transcode::transcode(deserializer, &mut serializer)
.map_err(|_| ScrubViewHierarchyError::TranscodeFailed)?;
Ok(serializer.into_inner())
}
}

#[cfg(test)]
Expand Down
180 changes: 180 additions & 0 deletions relay-pii/src/json.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
use crate::transform::Transform;
use crate::{CompiledPiiConfig, PiiProcessor};
use relay_event_schema::processor::{FieldAttrs, Pii, ProcessingState, Processor, ValueType};
use relay_protocol::Meta;
use std::borrow::Cow;

const FIELD_ATTRS_PII_TRUE: FieldAttrs = FieldAttrs::new().pii(Pii::True);

/// Describes the error cases that can happen during ViewHierarchy scrubbing.
#[derive(Debug, thiserror::Error)]
pub enum ScrubViewHierarchyError {
Litarnus marked this conversation as resolved.
Show resolved Hide resolved
/// If the transcoding process fails. This will most likely happen if a JSON document
/// is invalid.
#[error("transcoding view hierarchy json failed")]
TranscodeFailed,
}

/// Visitor for JSON file scrubbing. It will be used to walk through the structure and scrub
/// PII based on the config defined in the processor.
pub struct JsonScrubVisitor<'a> {
processor: PiiProcessor<'a>,
/// The state encoding the current path, which is fed by `push_path` and `pop_path`.
state: ProcessingState<'a>,
/// The current path. This is redundant with `state`, which also contains the full path,
/// but easier to match on.
path: Vec<String>,
}

impl<'a> JsonScrubVisitor<'a> {
/// Creates a new [`JsonScrubVisitor`] using the supplied config.
pub fn new(config: &'a CompiledPiiConfig) -> Self {
let processor = PiiProcessor::new(config);
Self {
processor,
state: ProcessingState::new_root(None, None),
path: Vec::new(),
}
}
}

impl<'de> Transform<'de> for JsonScrubVisitor<'de> {
fn push_path(&mut self, key: &'de str) {
self.path.push(key.to_owned());

self.state = std::mem::take(&mut self.state).enter_owned(
key.to_owned(),
Some(Cow::Borrowed(&FIELD_ATTRS_PII_TRUE)),
Some(ValueType::String), // Pretend everything is a string.
);
}

fn pop_path(&mut self) {
if let Ok(Some(parent)) = std::mem::take(&mut self.state).try_into_parent() {
self.state = parent;
}
let popped = self.path.pop();
debug_assert!(popped.is_some()); // pop_path should never be called on an empty state.
}

fn transform_str<'a>(&mut self, v: &'a str) -> Cow<'a, str> {
self.transform_string(v.to_owned())
}

fn transform_string(&mut self, mut v: String) -> Cow<'static, str> {
let mut meta = Meta::default();
if self
.processor
.process_string(&mut v, &mut meta, &self.state)
.is_err()
{
return Cow::Borrowed("");
}
Cow::Owned(v)
}
}

#[cfg(test)]
mod test {
use crate::{PiiAttachmentsProcessor, PiiConfig};
use serde_json::Value;

#[test]
pub fn test_view_hierarchy() {
let payload = r#"
{
"rendering_system": "UIKIT",
"identifier": "192.45.128.54",
"windows": [
{
"type": "UIWindow",
"identifier": "123.123.123.123",
"width": 414,
"height": 896,
"x": 0,
"y": 0,
"alpha": 1,
"visible": true,
"children": []
}
]
}
"#
.as_bytes();
let config = serde_json::from_str::<PiiConfig>(
r#"
{
"applications": {
"$string": ["@ip"]
}
}
"#,
)
.unwrap();
let processor = PiiAttachmentsProcessor::new(config.compiled());
let result = processor.scrub_json(payload).unwrap();
let parsed: Value = serde_json::from_slice(&result).unwrap();
assert_eq!("[ip]", parsed["identifier"].as_str().unwrap());
}

#[test]
pub fn test_view_hierarchy_nested_path_rule() {
let payload = r#"
{
"nested": {
"stuff": {
"ident": "10.0.0.1"
}
}
}
"#
.as_bytes();
let config = serde_json::from_str::<PiiConfig>(
r#"
{
"applications": {
"nested.stuff.ident": ["@ip"]
}
}
"#,
)
.unwrap();

let processor = PiiAttachmentsProcessor::new(config.compiled());
let result = processor.scrub_json(payload).unwrap();
let parsed: Value = serde_json::from_slice(&result).unwrap();
assert_eq!("[ip]", parsed["nested"]["stuff"]["ident"].as_str().unwrap());
}

#[test]
pub fn test_view_hierarchy_not_existing_path() {
let payload = r#"
{
"nested": {
"stuff": {
"ident": "10.0.0.1"
}
}
}
"#
.as_bytes();
let config = serde_json::from_str::<PiiConfig>(
r#"
{
"applications": {
"non.existent.path": ["@ip"]
}
}
"#,
)
.unwrap();

let processor = PiiAttachmentsProcessor::new(config.compiled());
let result = processor.scrub_json(payload).unwrap();
let parsed: Value = serde_json::from_slice(&result).unwrap();
assert_eq!(
"10.0.0.1",
parsed["nested"]["stuff"]["ident"].as_str().unwrap()
);
}
}
4 changes: 4 additions & 0 deletions relay-pii/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ mod compiledconfig;
mod config;
mod convert;
mod generate_selectors;
mod json;
mod legacy;
mod minidumps;
mod processor;
Expand All @@ -20,10 +21,13 @@ mod regexes;
mod selector;
mod utils;

pub mod transform;

pub use self::attachments::*;
pub use self::compiledconfig::*;
pub use self::config::*;
pub use self::generate_selectors::selector_suggestions_from_value;
pub use self::json::*;
pub use self::legacy::*;
pub use self::minidumps::*;
pub use self::processor::*;
Expand Down
24 changes: 23 additions & 1 deletion relay-replays/src/transform.rs → relay-pii/src/transform.rs
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ use serde::de;
/// }
/// }
/// ```
#[allow(missing_docs)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some docs instead?

pub trait Transform<'de> {
fn push_path(&mut self, _key: &'de str) {}

Expand Down Expand Up @@ -839,7 +840,10 @@ where
where
S: de::DeserializeSeed<'de>,
{
self.0.next_element_seed(DeserializeValueSeed(seed, self.1))
// We want to use a special ValueSeed for sequences that does not call pop_path
// because we don't push paths when entering sequences right now.
self.0
.next_element_seed(DeserializeSeqValueSeed(seed, self.1))
}
}

Expand Down Expand Up @@ -889,6 +893,24 @@ where
}
}

struct DeserializeSeqValueSeed<'a, D, T>(D, &'a mut T);

impl<'de, D, T> de::DeserializeSeed<'de> for DeserializeSeqValueSeed<'_, D, T>
where
D: de::DeserializeSeed<'de>,
T: Transform<'de>,
{
type Value = D::Value;

fn deserialize<X>(self, deserializer: X) -> Result<Self::Value, X::Error>
where
X: serde::Deserializer<'de>,
{
self.0
.deserialize(Deserializer::borrowed(deserializer, self.1))
}
}

struct DeserializeKeySeed<'a, D, T>(D, &'a mut T);

impl<'de, D, T> de::DeserializeSeed<'de> for DeserializeKeySeed<'_, D, T>
Expand Down
1 change: 0 additions & 1 deletion relay-replays/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,3 @@
#![warn(missing_docs)]

pub mod recording;
mod transform;
4 changes: 2 additions & 2 deletions relay-replays/src/recording.rs
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ use relay_protocol::Meta;
use serde::{de, ser, Deserializer};
use serde_json::value::RawValue;

use crate::transform::Transform;
use relay_pii::transform::Transform;

/// Paths to fields on which datascrubbing rules should be applied.
///
Expand Down Expand Up @@ -180,7 +180,7 @@ impl serde::Serialize for ScrubbedValue<'_, '_> {
{
let mut transform = self.1.borrow_mut();
let mut deserializer = serde_json::Deserializer::from_str(self.0.get());
let scrubber = crate::transform::Deserializer::new(&mut deserializer, &mut *transform);
let scrubber = relay_pii::transform::Deserializer::new(&mut deserializer, &mut *transform);
serde_transcode::transcode(scrubber, serializer)
}
}
Expand Down
1 change: 1 addition & 0 deletions relay-server/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ rmp-serde = { workspace = true }
serde = { workspace = true }
serde_bytes = { workspace = true }
serde_json = { workspace = true }
serde-transcode = { workspace = true }
smallvec = { workspace = true, features = ["drain_filter"] }
socket2 = { workspace = true }
sqlx = { workspace = true, features = [
Expand Down
Loading