Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(commands) Add index analyze command #136

Merged
merged 5 commits into from
Dec 13, 2023

Conversation

clemfromspace
Copy link
Contributor

@clemfromspace clemfromspace commented Nov 23, 2023

Index Analyze Command

This PR contains the CLI implementation of the index analyzer tool from https://github.com/algolia/tools.

This command displays records statistics - frequency of the attributes and their types - for the specified index.
This can be useful to help you identify individual records (or attributes) within an index that do not conform to the rest of the dataset (e.g. numeric attributes that have null values).

$ algolia -p media indices analyze prod_MEDIA

KEY                                 COUNT  %        TYPES                                            USED IN SETTINGS
backdrop_path                       1000   100.00%  string: 100.00%                                  []
bayesian_avg                        1000   100.00%  numeric: 100.00%                                 []
cast                                1000   100.00%  array: 100.00%                                   []
cast_lead                           1000   100.00%  array: 100.00%                                   []
created_by                          1000   100.00%  array: 100.00%                                   []
directors                           1000   100.00%  array: 100.00%                                   []
first_air_date                      1000   100.00%  numeric: 98.90%, null: 1.10%                     []
genres                              1000   100.00%  array: 100.00%                                   [attributesForFaceting searchableAttributes]
in_production                       1000   100.00%  boolean: 100.00%                                 []
last_air_date                       1000   100.00%  numeric: 100.00%                                 [customRanking]
last_episode_to_air                 1000   100.00%  object: 100.00%                                  []
last_episode_to_air.air_date        1000   100.00%  numeric: 100.00%                                 []
last_episode_to_air.episode_number  1000   100.00%  numeric: 100.00%                                 []
last_episode_to_air.name            1000   100.00%  null: 63.00%, string: 37.00%                     []
last_episode_to_air.overview        1000   100.00%  string: 29.80%, null: 70.20%                     []
last_episode_to_air.season_number   1000   100.00%  numeric: 100.00%                                 []
last_episode_to_air.still_path      1000   100.00%  null: 69.20%, string: 30.80%                     []
last_episode_to_air.vote_average    1000   100.00%  numeric: 100.00%                                 []
networks                            1000   100.00%  array: 100.00%                                   []
next_episode_to_air                 1000   100.00%  object: 69.00%, null: 31.00%                     []
next_episode_to_air.air_date        690    69.00%   numeric: 69.00%, undefined: 31.00%               []
next_episode_to_air.episode_number  690    69.00%   numeric: 69.00%, undefined: 31.00%               []
next_episode_to_air.name            690    69.00%   string: 19.60%, undefined: 31.00%, null: 49.40%  []
next_episode_to_air.overview        690    69.00%   undefined: 31.00%, null: 56.40%, string: 12.60%  []
next_episode_to_air.season_number   690    69.00%   numeric: 69.00%, undefined: 31.00%               []
next_episode_to_air.still_path      690    69.00%   string: 9.20%, undefined: 31.00%, null: 59.80%   []
next_episode_to_air.vote_average    690    69.00%   numeric: 69.00%, undefined: 31.00%               []
number_of_episodes                  1000   100.00%  numeric: 100.00%                                 []
number_of_seasons                   1000   100.00%  numeric: 100.00%                                 []
objectID                            1000   100.00%  string: 100.00%                                  []
origin_country                      1000   100.00%  array: 100.00%                                   []
original_language                   1000   100.00%  string: 100.00%                                  []
original_title                      1000   100.00%  string: 100.00%                                  [searchableAttributes]
overview                            1000   100.00%  null: 33.00%, string: 67.00%                     []
popularity                          1000   100.00%  numeric: 100.00%                                 [customRanking]
popularity_bucketed                 1000   100.00%  numeric: 100.00%                                 []
poster_path                         1000   100.00%  string: 100.00%                                  []
record_type                         1000   100.00%  string: 100.00%                                  [attributesForFaceting]
seasons                             1000   100.00%  array: 100.00%                                   []
spoken_languages                    1000   100.00%  array: 100.00%                                   [attributesForFaceting]
status                              1000   100.00%  string: 100.00%                                  [attributesForFaceting]
tagline                             1000   100.00%  null: 87.40%, string: 12.60%                     []
title                               1000   100.00%  string: 100.00%                                  [searchableAttributes]
type                                1000   100.00%  string: 100.00%                                  [attributesForFaceting]
videos                              1000   100.00%  array: 100.00%                                   []
vote_average                        1000   100.00%  numeric: 100.00%                                 []
vote_count                          1000   100.00%  numeric: 100.00%                                 []
$ algolia -p media indices analyze prod_MEDIA --only genres

VALUE               COUNT  %
Drama               284    28.40%
Comedy              197    19.70%
Reality             165    16.50%
Documentary         104    10.40%
Animation           88     8.80%
Family              75     7.50%
Crime               71     7.10%
Talk                66     6.60%
Action & Adventure  62     6.20%
Sci-Fi & Fantasy    51     5.10%
Mystery             44     4.40%
News                32     3.20%
Soap                31     3.10%
Kids                21     2.10%
War & Politics      10     1.00%
Western             1      0.10%
Music               1      0.10%

@clemfromspace clemfromspace force-pushed the feat/index-analyze-command branch from 726ee83 to a4a77d6 Compare November 23, 2023 14:12
@clemfromspace clemfromspace marked this pull request as draft November 23, 2023 15:05
@clemfromspace clemfromspace force-pushed the feat/index-analyze-command branch 2 times, most recently from f7822e6 to 980165f Compare December 6, 2023 09:12
@clemfromspace clemfromspace force-pushed the feat/index-analyze-command branch from 980165f to b33e5f4 Compare December 6, 2023 09:23
@clemfromspace clemfromspace marked this pull request as ready for review December 6, 2023 09:25
loicsay

This comment was marked as resolved.

Copy link
Contributor

@loicsay loicsay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big PR, good job 👏 👏
inline comments helped a lot for the review

@clemfromspace clemfromspace merged commit e2265eb into main Dec 13, 2023
2 checks passed
@clemfromspace clemfromspace deleted the feat/index-analyze-command branch March 11, 2024 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants