Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lite logging #28

Open
mrbluecoat opened this issue Apr 3, 2022 · 6 comments
Open

Lite logging #28

mrbluecoat opened this issue Apr 3, 2022 · 6 comments
Assignees

Comments

@mrbluecoat
Copy link

I read https://smithproxy.readthedocs.io/en/latest/capture-traffic/ regarding the default smcap logging. Is there a "lite" option for just timestamp, source, and URL? Perhaps in JSON format for easy querying.

@astibal
Copy link
Owner

astibal commented May 18, 2022

Hi, I am so sorry, I missed your message!
Certainly I can consider lite logging for you. There is a new feature called KB - knowledge-base. Something similar what you asked for, but not exactly.

BTW - default capture format is since some time PCAP, which is faster compared to smcap. I need to update docs!

@mrbluecoat
Copy link
Author

Can you send me more info or a doc link regarding KB knowledge base? My requirements are fairly flexible.

@astibal
Copy link
Owner

astibal commented May 19, 2022

KB is a new feature not in docs yet, it is just being developed.

KB is supposed to log interesting L7 information into an internal tree followint:

domain > hostname > protocol > protocol specific details

Tree can be dumped into JSON file by command, or by the scheduler. The tree is limited in number of tree nodes at a time, fresh nodes are kept, old are removed.

In current devel version (pre 0.9.31) it works for first few streams of HTTP/2 session. Engine is active only for certain amount of bytes in the beginning of TCP/UDP connection. Node database is limited to 1000 entries (will be configurable).

Currently, it looks as follows:

smithproxy(sx1)# execute kb print 
Knowledgebase dump:
{
    "com.bing.": {
        "www.bing.com": {
            "/hp/api/v1/imagegallery?format=json&setmkt=en-cz&today=1": {
                ":status": {
                    ".": 200,
                    "counter": 1
                },
                "set-cookie": {
                    "@1652926530": "_EDGE_S=SID=XXXXXXXXXXXXXXXXX&mkt=en-cz; domain=.bing.com; path=/; HttpOnly"
                }
            },
            "cookie": {
                "@1652926530": "_SS=SID=XXXXXXXXXXXXXXXXX"
            }
        }
    },
    "com.bitwarden.": {
        "identity.bitwarden.com": {
            "/connect/token": 

The plan is:

  • Improve engine to work in some limited fashion also later in the TCP/UDP stream. This works already for DoH traffic, but on larger volumes it can become quite CPU intensive, which I want to avoid.
  • I have to also improve HTTP/1 engine substantially to match current HTTP/2 engine.
  • Having above issues sorted out, I can dump KB export periodically.

JSON is not great for appending to existing files, one must read it, parse it, add new data into the three and write again. If you look for more continuous data to, say, tail -f it, probably CSV format would be better.

@mrbluecoat
Copy link
Author

Yes, CSV would work. https://questdb.io/docs/guides/importing-data/

@astibal astibal self-assigned this May 20, 2022
@astibal
Copy link
Owner

astibal commented May 20, 2022

refactor flow queue 2be39fb - prerequisite for large flows

@mrbluecoat
Copy link
Author

Let me know when this is ready to test and I'll be happy to help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants