You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Split into two paragraphs, maybe extend the first on
First, we would like to filter some words that are very common in the English language, such as articles and pronouns, which will most likely add noise rather than information to our classification algorithm. For this we will use two Julia packages that are specially designed for working with texts of any type. These are Languages.jl and TextAnalysis.jl.
A good practice when dealing with models that learn from data like the one we are going to implement, is to divide our data into two: a training set and a testing set. We need to measure how good our model is performing, so we will train it with some data, and test it with some other data the model has never seen. This way we may be sure that the model is not tricking us. In Julia, the package MLDataUtils has some nice functionalities for data manipulations like this. We will use the functions splitobs to split our dataset in a train set and a test set and shuffleobs to randomize the order of our data in the split. It is important also to pass a labels array to our split function so that it knows how to properly split our dataset.
Note: I have the impression there are many places where sentences should have been split into two or more paragraphs. Maybe this is a rendering issue and the sentences are separated in the source?
Explain formulas in one or two sentences, consider making crossreference to the section in chapter 2.
The probability of finding a particular word in an email, given that we have a spam email, can be calculated like so:
How to compute the priors is not explained in the text
Chapter 6
Out of the blue
The sentence "So, the model we are going to propose is a linear regression. A linear equation has the form:"
is not well connected with the previous paragraphs. It is not clear why we need a linear model at all. The transition from mechanics to statistics should be smoother
Maybe you can invert the order of the story
We want to scape from mars
We need to find out the scape velocity
For that we need to find g_mars
We realize x = f(g, t), so we can throw stones to find g!
But measurement are noisy, hence we create the model x ~ Normal(f(g, t), σ)
We now try to find f from what we remember from high-school physics.
We got it! we collect a few datapoints
We explore a few priors.
Justify Priors
Say something like know it has to be positive and and less than g_earth, which is 9.8, and can round up to 10.
Do the same for the other two priors
Discuss the posterior with angle uncertainty vs the one without it.
Consider a new plot with them side by side or use overlapped. Discuss mean and standard deviation or HDI (or some other interval)
The text was updated successfully, but these errors were encountered:
Chapter 4:
First, we would like to filter some words that are very common in the English language, such as articles and pronouns, which will most likely add noise rather than information to our classification algorithm. For this we will use two Julia packages that are specially designed for working with texts of any type. These are Languages.jl and TextAnalysis.jl.
A good practice when dealing with models that learn from data like the one we are going to implement, is to divide our data into two: a training set and a testing set. We need to measure how good our model is performing, so we will train it with some data, and test it with some other data the model has never seen. This way we may be sure that the model is not tricking us. In Julia, the package MLDataUtils has some nice functionalities for data manipulations like this. We will use the functions splitobs to split our dataset in a train set and a test set and shuffleobs to randomize the order of our data in the split. It is important also to pass a labels array to our split function so that it knows how to properly split our dataset.
Note: I have the impression there are many places where sentences should have been split into two or more paragraphs. Maybe this is a rendering issue and the sentences are separated in the source?
The probability of finding a particular word in an email, given that we have a spam email, can be calculated like so:
Chapter 6
The sentence "So, the model we are going to propose is a linear regression. A linear equation has the form:"
is not well connected with the previous paragraphs. It is not clear why we need a linear model at all. The transition from mechanics to statistics should be smoother
Maybe you can invert the order of the story
Say something like know it has to be positive and and less than g_earth, which is 9.8, and can round up to 10.
Do the same for the other two priors
Consider a new plot with them side by side or use overlapped. Discuss mean and standard deviation or HDI (or some other interval)
The text was updated successfully, but these errors were encountered: