Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't handle binary data #2

Open
huancz opened this issue Nov 20, 2016 · 2 comments
Open

can't handle binary data #2

huancz opened this issue Nov 20, 2016 · 2 comments

Comments

@huancz
Copy link

huancz commented Nov 20, 2016

testcase:

dn: cn=Barbara Jensen, ou=Product Development, dc=airius, dc=com
objectGUID;binary:: x7HAMN/oRNyn4w==

shift() result:

{ type: 'record',
  dn: 'cn=Barbara Jensen, ou=Product Development, dc=airius, dc=com',
  attributes:
   [ { attribute:
        { type: 'attribute',
          options: [ 'binary' ],
          attribute: 'objectGUID' },
       value: { type: 'value', value: 'DZ�0��Dܧ�' } } ] }

toObject doesn't make it any better:

{ dn: 'cn=Barbara Jensen, ou=Product Development, dc=airius, dc=com',
  attributes: { 'objectGUID;binary': [ 'DZ�0��Dܧ�' ] } }

The above ldif output format is real life data from running ldapsearch against Microsoft AD, I only changed the dn and objectGUID value to protect the innocent. The ;binary option is not added automatically by AD, more on that below.

Guessing which attributes are truly binary and which can be decoded to string is a hard problem (read - probably no better solution that enumerate binary exceptions, spec doesn't say anything about it). Another product (node-LDAP) solves this by letting application code to "tag" attributes with ";binary" option when searching. AD server doesn't care and returns the same data in both cases, but parser library can use such tag and return the attributes as Buffer. Either Buffer or original base64 string would be fine, but binary data decoded into javascript string is definitely not fine.

Unfortunately I'm not in the position to learn peg and send a PR.

Thanks

@tapmodo
Copy link
Owner

tapmodo commented Dec 23, 2016

Hi! It appears it does detect and decode the binary data as expected. When I was writing the parser I wasn't sure what was the best way to handle this. My thinking was something as simple as a leading or trailing space can cause the value to need to be encoded, and I wanted to make using the data as simple as possible. However, I didn't have a lot of real world examples of how binary values are used (such as in AD). I can see why it is not desirable in this case.

In the first example output above, would it address the issue if, in addition to the value property, it also included encoded and buffer properties? Or perhaps value could be the Buffer, and include encoded and decoded values? Or just have the value be a Buffer and leave it at that?

Thanks for filing this issue! Sorry I didn't see it sooner. If you can let me know your thoughts on what you would like to see/get back in this case, that would be helpful in coming up with a solution.

@huancz
Copy link
Author

huancz commented Dec 25, 2016

It appears it does detect and decode the binary data as expected.

I'm not 100% sure if I understand this, just so we are on the same page here: I agree that base64 encoding is detected and decoded into Buffer correctly.

Ldap has many reasons why it would send value as base64. I don't know if the starting space can cause it like you mention, but the value being an UTF-8 string certainly will. Or UTF-16. Or being an image, or GUID or other binary data. Since it doesn't send mime-type or similar information, when the library detects base64, all it can SAFELY do is return a Buffer.

But your proposed solution seems fine. Interpreting the value as UTF-8 string is good default for maybe 90% of cases and it means simpler interface. If you give client application access to the decoded Buffer too, all it means it that the library is doing more work and using more memory than necessary. Neither should be a problem, ldiff data is usually small.

As to the method which to choose... Returning two representations of the same value doesn't work too well with toObject, the returned object would be much uglier. Returning only Buffer objects breaks backwards compatibility in another way.

Perhaps define the "not yet well-defined, leave true" option of toObject. decode=false would mean that some of the returned array members could be Buffer objects. Unless I'm forgetting something, it would mean that old applications will work the same as they did before, and those who need binary will have the option to set it to false and check Buffer.isBuffer before using the value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants