Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance of country_name_for_number #90

Open
kputland opened this issue Jan 23, 2017 · 3 comments
Open

performance of country_name_for_number #90

kputland opened this issue Jan 23, 2017 · 3 comments

Comments

@kputland
Copy link

Why is country_name_for_number so slow compared to region_code_for_number?
I'd expect the a lookup for the region_code to country_name to be quite quick. It appears to check validity of the number. I haven't figured out where most time is spent yet.
Suppose there's any way to speed things up?

--Karl

from phonenumbers import format_number, is_possible_number, is_possible_number_with_reason, is_valid_number, number_type
from phonenumbers.geocoder import region_code_for_number, country_name_for_number, description_for_number

# setup p for test
p = parse('13035551212', 'US')
print("parse_number:", timeit(lambda: parse('13035551212','US'), number=10000))
print("format_number:", timeit(lambda: format_number(p, phonenumbers.PhoneNumberFormat.E164), number=10000))
print("is_possible_number:", timeit(lambda: is_possible_number(p), number=10000))
print("is_possible_number_with_reason:", timeit(lambda: is_possible_number_with_reason(p), number=10000))
print("is_valid_number:", timeit(lambda: is_valid_number(p), number=10000))
print("number_type:", timeit(lambda: number_type(p), number=10000))
print("region_code_for_number:", timeit(lambda: region_code_for_number(p), number=10000))
print("country_name_for_number:", timeit(lambda: country_name_for_number(p, 'en'), number=10000))
print("desctription_for_number:", timeit(lambda: description_for_number(p, 'en'), number=10000))

results

('parse_number:', 1.0535039901733398)
('format_number:', 0.021859169006347656)
('is_possible_number:', 0.036605119705200195)
('is_possible_number_with_reason:', 0.0437769889831543)
('is_valid_number:', 1.0846400260925293)
('number_type:', 1.1673779487609863)
('region_code_for_number:', 0.5287020206451416)
('country_name_for_number:', 4.455769062042236)
('desctription_for_number:', 1.7113380432128906)

@kevin-brown
Copy link

This probably has to do with the fact that country_name_for_number will check all countries to ensure that only a single country is found for a number. Most of the other methods appear to exit as soon as it finds one, even if there might be duplicates.

The change that enforced no duplicates was added in 0a9a735, which points to r701 in the upstream libphonenumber. You can find that revision at google/libphonenumber@af00741, and based on the test case this was done because of NANP toll-free numbers.

@kputland
Copy link
Author

I understand that logic.
It would be nice if it could be optimized at least for country_code == 1 since NPA should be enough to cache a result and there'd be < 1000 values

I guess I could decorate it for my own needs....

@kputland
Copy link
Author

Example for anyone else that may want a similar solution.

from functools import wraps
def phonenumber_memoizer(e164_prefix_len=5):
    """e164_prefix_len should be long enough to uniquely identify region information
    
    e164_prefix_len == 5 # +1NPA should be enough to identify a country
    e164_prefix_len == 8 # +1NPANXX should be enough to identify a description/city
    """
    def decorator(f):
        memo = {}
        @wraps(f)
        def wrapper(numobj, *args, **kwargs):
            #print("country_for_number_memoizer: country_code: {}".format(numobj.country_code))
            if numobj.country_code == 1:
                e164_num = format_number(numobj, phonenumbers.PhoneNumberFormat.E164) 
                npa = e164_num[:e164_prefix_len]
                if npa not in memo:
                    memo[npa] = f(numobj, *args, **kwargs)
                return memo[npa]
            else:
                return f(numobj, *args **kwargs)
        return wrapper
    return decorator

memo_number_type = phonenumber_memoizer(e164_prefix_len=8)(number_type)
memo_region_code_for_number = phonenumber_memoizer(e164_prefix_len=5)(region_code_for_number)
memo_country_name_for_number = phonenumber_memoizer(e164_prefix_len=5)(country_name_for_number)
memo_description_for_number = phonenumber_memoizer(e164_prefix_len=8)(description_for_number)

# setup p for test
p = parse('+13035551212', 'US')

print("parse_number:", timeit(lambda: parse('+13035551212','US'), number=10000))
print("format_number:", timeit(lambda: format_number(p, phonenumbers.PhoneNumberFormat.E164), number=10000))
print("is_possible_number:", timeit(lambda: is_possible_number(p), number=10000))
print("is_possible_number_with_reason:", timeit(lambda: is_possible_number_with_reason(p), number=10000))
print("is_valid_number:", timeit(lambda: is_valid_number(p), number=10000))
print
print("number_type:", timeit(lambda: number_type(p), number=10000))
print("region_code_for_number:", timeit(lambda: region_code_for_number(p), number=10000))
print("country_name_for_number:", timeit(lambda: country_name_for_number(p, 'en'), number=10000))
print("description_for_number:", timeit(lambda: description_for_number(p, 'en'), number=10000))
print
print("memo_number_type:", timeit(lambda: memo_number_type(p), number=10000))
print("memo_region_code_for_number:", timeit(lambda: memo_region_code_for_number(p), number=10000))
print("memo_country_name_for_number:", timeit(lambda: memo_country_name_for_number(p, 'en'), number=10000))
print("memo_description_for_number:", timeit(lambda: memo_description_for_number(p, 'en'), number=10000))

results

('parse_number:', 0.9178240299224854)
('format_number:', 0.028091907501220703)
('is_possible_number:', 0.04349517822265625)
('is_possible_number_with_reason:', 0.05940389633178711)
('is_valid_number:', 1.1776368618011475)

('number_type:', 1.135775089263916)
('region_code_for_number:', 0.5933191776275635)
('country_name_for_number:', 4.652255058288574)
('description_for_number:', 1.7402870655059814)

('memo_number_type:', 0.03579902648925781)
('memo_region_code_for_number:', 0.03642702102661133)
('memo_country_name_for_number:', 0.0368959903717041)
('memo_description_for_number:', 0.03756093978881836)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants