Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce IterRef trait for deserialization #73

Merged
merged 3 commits into from
Oct 10, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -15,7 +15,7 @@ jobs:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1
with:
toolchain: stable
toolchain: beta # FIXME: revert to stable when 1.65 lands
components: clippy
- name: "clippy --all"
run: cargo clippy --all --tests -- -D warnings
@@ -27,7 +27,7 @@ jobs:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1
with:
toolchain: stable
toolchain: beta # FIXME: revert to stable when 1.65 lands
components: rustfmt
- name: Run
run: cargo fmt --all -- --check
@@ -39,7 +39,7 @@ jobs:
- uses: actions/checkout@v2
- uses: actions-rs/toolchain@v1
with:
toolchain: stable
toolchain: beta # FIXME: revert to stable when 1.65 lands
- uses: Swatinem/rust-cache@v1
- uses: taiki-e/install-action@nextest
- name: Run
@@ -53,7 +53,7 @@ jobs:
- uses: actions/checkout@v3
- uses: actions-rs/toolchain@v1
with:
toolchain: stable
toolchain: beta # FIXME: revert to stable when 1.65 lands
components: llvm-tools-preview
- name: Install cargo-llvm-cov
uses: taiki-e/install-action@cargo-llvm-cov
34 changes: 34 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
## Changelog

All notable changes to `arrow2-convert` project will be documented in this file.

## [Unreleased]
## [0.4.0]
### Changed
- MSRV is bumped to 1.65.0 due to dependency on GATs (#73, @aldanor).
- `ArrowDeserialize` can now be used as a standalone trait bound (#73, @aldanor).

## [0.3.1] - 2022-09-29
### Changed
- Update arrow2 version to 0.14 (#66, @ncpenke).

## [0.3.0] - 2022-08-25
### Added
- Add support for converting to `Chunk` (#44, @ncpenke).
- Add support for `i128` (#48, @ncpenke).
- Add support for enums (#37, @ncpenke).
- Add support for flattening chunks (#56, @nielsmeima).

### Changed
- Serialize escaped Rust identifiers unescaped (#59, @teymour-aldridge).
- Update arrow2 version to 0.13 (#61, @teymour-aldridge).

## [0.2.0] - 2022-06-13
### Added
- Add support for `FixedSizeBinary` and `FixedSizeList` (#30, @ncpenke).

### Changed
- Update arrow2 version to 0.12 (#38, @joshuataylor).

## [0.1.0] - 2022-03-03
Initial crate release.
74 changes: 38 additions & 36 deletions arrow2_convert/src/deserialize.rs
Original file line number Diff line number Diff line change
@@ -5,18 +5,38 @@ use chrono::{NaiveDate, NaiveDateTime};

use crate::field::*;

/// Implemented by [`ArrowField`] that can be deserialized from arrow
pub trait ArrowDeserialize: ArrowField + Sized
#[doc(hidden)]
/// Type whose reference can be used to create an iterator.
pub trait IterRef {
/// Iterator type.
type Iter<'a>: Iterator
where
Self: 'a;

/// Converts `&self` into an iterator.
fn iter_ref(&self) -> Self::Iter<'_>;
}

impl<T> IterRef for T
where
Self::ArrayType: ArrowArray,
for<'a> &'a Self::ArrayType: IntoIterator,
for<'a> &'a T: IntoIterator,
{
type Iter<'a> = <&'a T as IntoIterator>::IntoIter where Self: 'a;

#[inline]
fn iter_ref(&self) -> Self::Iter<'_> {
self.into_iter()
}
}

/// Implemented by [`ArrowField`] that can be deserialized from arrow
pub trait ArrowDeserialize: ArrowField + Sized {
/// The `arrow2::Array` type corresponding to this field
type ArrayType;
type ArrayType: ArrowArray;

/// Deserialize this field from arrow
fn arrow_deserialize(
v: <&Self::ArrayType as IntoIterator>::Item,
v: <<Self::ArrayType as IterRef>::Iter<'_> as Iterator>::Item,
) -> Option<<Self as ArrowField>::Type>;

#[inline]
@@ -28,26 +48,23 @@ where
/// something like for<'a> &'a T::ArrayType: IntoIterator<Item=Option<E>>,
/// However, the E parameter seems to confuse the borrow checker if it's a reference.
fn arrow_deserialize_internal(
v: <&Self::ArrayType as IntoIterator>::Item,
v: <<Self::ArrayType as IterRef>::Iter<'_> as Iterator>::Item,
) -> <Self as ArrowField>::Type {
Self::arrow_deserialize(v).unwrap()
}
}

/// Internal trait used to support deserialization and iteration of structs, and nested struct lists
///
/// Trivial pass-thru implementations are provided for arrow2 arrays that implement IntoIterator.
/// Trivial pass-thru implementations are provided for arrow2 arrays that auto-implement IterRef.
///
/// The derive macro generates implementations for typed struct arrays.
#[doc(hidden)]
pub trait ArrowArray
where
for<'a> &'a Self: IntoIterator,
{
pub trait ArrowArray: IterRef {
type BaseArrayType: Array;

// Returns a typed iterator to the underlying elements of the array from an untyped Array reference.
fn iter_from_array_ref(b: &dyn Array) -> <&Self as IntoIterator>::IntoIter;
fn iter_from_array_ref(b: &dyn Array) -> <Self as IterRef>::Iter<'_>;
}

// Macro to facilitate implementation for numeric types and numeric arrays.
@@ -71,11 +88,11 @@ macro_rules! impl_arrow_array {
impl ArrowArray for $array {
type BaseArrayType = Self;

fn iter_from_array_ref(b: &dyn Array) -> <&Self as IntoIterator>::IntoIter {
fn iter_from_array_ref(b: &dyn Array) -> <Self as IterRef>::Iter<'_> {
b.as_any()
.downcast_ref::<Self::BaseArrayType>()
.unwrap()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ncpenke I was wondering about this unwrap - can it panic?... This doesn't look too nice

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aldanor In theory it can, though we do a top-level schema check. But if the underlying arrow structure is different than the schema it advertises, then we can panic. I'll fix it in my upcoming change. I'll add you as a reviewer so that you can include any other feedback.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aldanor Just a heads up another thing we need to do is introduce our own error type (#54). We're currently piggybacking-off arrow2's type.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I can take on that (the error type).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'm almost done with my changes. I just changed the unwrap to an Option, and will leave the error type change to you.

.into_iter()
.iter_ref()
}
}
};
@@ -85,21 +102,19 @@ macro_rules! impl_arrow_array {
impl<T> ArrowDeserialize for Option<T>
where
T: ArrowDeserialize,
T::ArrayType: 'static + ArrowArray,
for<'a> &'a T::ArrayType: IntoIterator,
{
type ArrayType = <T as ArrowDeserialize>::ArrayType;

#[inline]
fn arrow_deserialize(
v: <&Self::ArrayType as IntoIterator>::Item,
v: <<Self::ArrayType as IterRef>::Iter<'_> as Iterator>::Item,
) -> Option<<Self as ArrowField>::Type> {
Self::arrow_deserialize_internal(v).map(Some)
}

#[inline]
fn arrow_deserialize_internal(
v: <&Self::ArrayType as IntoIterator>::Item,
v: <<Self::ArrayType as IterRef>::Iter<'_> as Iterator>::Item,
) -> <Self as ArrowField>::Type {
<T as ArrowDeserialize>::arrow_deserialize(v)
}
@@ -204,7 +219,6 @@ fn arrow_deserialize_vec_helper<T>(
) -> Option<<Vec<T> as ArrowField>::Type>
where
T: ArrowDeserialize + ArrowEnableVecForType + 'static,
for<'a> &'a T::ArrayType: IntoIterator,
{
use std::ops::Deref;
v.map(|t| {
@@ -217,8 +231,6 @@ where
impl<T> ArrowDeserialize for Vec<T>
where
T: ArrowDeserialize + ArrowEnableVecForType + 'static,
<T as ArrowDeserialize>::ArrayType: 'static,
for<'b> &'b <T as ArrowDeserialize>::ArrayType: IntoIterator,
{
type ArrayType = ListArray<i32>;

@@ -230,8 +242,6 @@ where
impl<T> ArrowDeserialize for LargeVec<T>
where
T: ArrowDeserialize + ArrowEnableVecForType + 'static,
<T as ArrowDeserialize>::ArrayType: 'static,
for<'b> &'b <T as ArrowDeserialize>::ArrayType: IntoIterator,
{
type ArrayType = ListArray<i64>;

@@ -243,8 +253,6 @@ where
impl<T, const SIZE: usize> ArrowDeserialize for FixedSizeVec<T, SIZE>
where
T: ArrowDeserialize + ArrowEnableVecForType + 'static,
<T as ArrowDeserialize>::ArrayType: 'static,
for<'b> &'b <T as ArrowDeserialize>::ArrayType: IntoIterator,
{
type ArrayType = FixedSizeListArray;

@@ -276,30 +284,27 @@ where
/// useful when the same rust type maps to one or more Arrow types for example `LargeString`.
fn try_into_collection_as_type<ArrowType>(self) -> arrow2::error::Result<Collection>
where
ArrowType: ArrowDeserialize + ArrowField<Type = Element> + 'static,
for<'b> &'b <ArrowType as ArrowDeserialize>::ArrayType: IntoIterator;
ArrowType: ArrowDeserialize + ArrowField<Type = Element> + 'static;
}

/// Helper to return an iterator for elements from a [`arrow2::array::Array`].
fn arrow_array_deserialize_iterator_internal<'a, Element, Field>(
b: &'a dyn arrow2::array::Array,
b: &'a dyn Array,
) -> impl Iterator<Item = Element> + 'a
where
Field: ArrowDeserialize + ArrowField<Type = Element> + 'static,
for<'b> &'b <Field as ArrowDeserialize>::ArrayType: IntoIterator,
{
<<Field as ArrowDeserialize>::ArrayType as ArrowArray>::iter_from_array_ref(b)
.map(<Field as ArrowDeserialize>::arrow_deserialize_internal)
}

/// Returns a typed iterator to a target type from an `arrow2::Array`
pub fn arrow_array_deserialize_iterator_as_type<'a, Element, ArrowType>(
arr: &'a dyn arrow2::array::Array,
arr: &'a dyn Array,
) -> arrow2::error::Result<impl Iterator<Item = Element> + 'a>
where
Element: 'static,
ArrowType: ArrowDeserialize + ArrowField<Type = Element> + 'static,
for<'b> &'b <ArrowType as ArrowDeserialize>::ArrayType: IntoIterator,
{
if &<ArrowType as ArrowField>::data_type() != arr.data_type() {
Err(arrow2::error::Error::InvalidArgumentError(
@@ -315,19 +320,17 @@ where

/// Return an iterator that deserializes an [`Array`] to an element of type T
pub fn arrow_array_deserialize_iterator<'a, T>(
arr: &'a dyn arrow2::array::Array,
arr: &'a dyn Array,
) -> arrow2::error::Result<impl Iterator<Item = T> + 'a>
where
T: ArrowDeserialize + ArrowField<Type = T> + 'static,
for<'b> &'b <T as ArrowDeserialize>::ArrayType: IntoIterator,
{
arrow_array_deserialize_iterator_as_type::<T, T>(arr)
}

impl<Collection, Element, ArrowArray> TryIntoCollection<Collection, Element> for ArrowArray
where
Element: ArrowDeserialize + ArrowField<Type = Element> + 'static,
for<'b> &'b <Element as ArrowDeserialize>::ArrayType: IntoIterator,
ArrowArray: std::borrow::Borrow<dyn Array>,
Collection: FromIterator<Element>,
{
@@ -338,7 +341,6 @@ where
fn try_into_collection_as_type<ArrowType>(self) -> arrow2::error::Result<Collection>
where
ArrowType: ArrowDeserialize + ArrowField<Type = Element> + 'static,
for<'b> &'b <ArrowType as ArrowDeserialize>::ArrayType: IntoIterator,
{
Ok(
arrow_array_deserialize_iterator_as_type::<Element, ArrowType>(self.borrow())?