Skip to content

Commit

Permalink
Use HashSet for error deduplication (#6268)
Browse files Browse the repository at this point in the history
## Description
Optimization Explanation
- Using HashSet to track seen elements simplifies the code logic.
- Using the retain method directly filters out duplicate elements in
place, avoiding the complexity of manually managing indices and swapping
elements.
- The code is more concise and readable while maintaining the original
order.
- Time complexity comparison: Original code: Due to the need to manually
manage indices and swap elements, the time -complexity is O(n^2) (in the
worst case). Optimized code: Using HashSet and the retain method, the
time complexity is O(n).
- Space complexity comparison: Original code: Requires additional
HashMap and SmallVec to store hash values and indices. Optimized code:
Only requires a HashSet to store seen elements.

## Checklist

- [ ] I have linked to any relevant issues.
- [ ] I have commented my code, particularly in hard-to-understand
areas.
- [ ] I have updated the documentation where relevant (API docs, the
reference, and the Sway book).
- [ ] If my change requires substantial documentation changes, I have
[requested support from the DevRel
team](https://github.com/FuelLabs/devrel-requests/issues/new/choose)
- [ ] I have added tests that prove my fix is effective or that my
feature works.
- [ ] I have added (or requested a maintainer to add) the necessary
`Breaking*` or `New Feature` labels where relevant.
- [x] I have done my best to ensure that my PR adheres to [the Fuel Labs
Code Review
Standards](https://github.com/FuelLabs/rfcs/blob/master/text/code-standards/external-contributors.md).
- [x] I have requested a review from the relevant team or maintainers.

---------

Co-authored-by: IGI-111 <[email protected]>
  • Loading branch information
ylmin and IGI-111 authored Jul 16, 2024
1 parent fe89d16 commit 807d7f4
Showing 1 changed file with 5 additions and 33 deletions.
38 changes: 5 additions & 33 deletions sway-error/src/handler.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
use crate::{error::CompileError, warning::CompileWarning};
use std::collections::HashMap;

use core::cell::RefCell;

Expand Down Expand Up @@ -138,37 +137,10 @@ pub struct ErrorEmitted {
/// Stdlib dedup in Rust assumes sorted data for efficiency, but we don't want that.
/// A hash set would also mess up the order, so this is just a brute force way of doing it
/// with a vector.
fn dedup_unsorted<T: PartialEq + std::hash::Hash>(mut data: Vec<T>) -> Vec<T> {
// TODO(Centril): Consider using `IndexSet` instead for readability.
use smallvec::SmallVec;
use std::collections::hash_map::{DefaultHasher, Entry};
use std::hash::Hasher;

let mut write_index = 0;
let mut indexes: HashMap<u64, SmallVec<[usize; 1]>> = HashMap::with_capacity(data.len());
for read_index in 0..data.len() {
let hash = {
let mut hasher = DefaultHasher::new();
data[read_index].hash(&mut hasher);
hasher.finish()
};
let index_vec = match indexes.entry(hash) {
Entry::Occupied(oe) => {
if oe
.get()
.iter()
.any(|index| data[*index] == data[read_index])
{
continue;
}
oe.into_mut()
}
Entry::Vacant(ve) => ve.insert(SmallVec::new()),
};
data.swap(write_index, read_index);
index_vec.push(write_index);
write_index += 1;
}
data.truncate(write_index);
fn dedup_unsorted<T: PartialEq + std::hash::Hash + Clone + Eq>(mut data: Vec<T>) -> Vec<T> {
use std::collections::HashSet;

let mut seen = HashSet::new();
data.retain(|item| seen.insert(item.clone()));
data
}

0 comments on commit 807d7f4

Please sign in to comment.