forked from DataStax-Academy/developer-day-2019
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path02-01_-_Data_Modeling_Data_Modeling_Intro.studio-nb.tar
409 lines (360 loc) · 90 KB
/
02-01_-_Data_Modeling_Data_Modeling_Intro.studio-nb.tar
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
notebook.bin 0100644 0000000 0000000 00000152222 13563010763 012127 0 ustar 00 0000000 0000000 json_notebook_v1 {"1":"77636de4-6320-41a8-901e-bf7ca1ad0197","10":"418ed742-cd61-4df8-abd8-07bc56a62e8d","11":"02-01 - Data Modeling: Data Modeling Intro","12":{"1":1573655005,"2":772000000},"13":{"1":1573655005,"2":772000000},"14":false,"15":[{"1":"2d16de64-de2a-460c-8ec4-06183ebb8a7c","10":4,"11":"<center><img src=\"//datastaxtraining.s3.amazonaws.com/developer-day/developer-day-studio-header.png\" width=\"800\"></center>\n\n![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n# <center><span style=\"color:navy\">Data Modeling Intro</span></center>\n\nWelcome to the intro notebook on data modeling in Cassandra. In this notebook we will learn to:\n* Create keyspaces and simple tables\n* Insert data into the tables\n* Perform simple queries\n\nThis notebook is a step-by-step tutorial. \nEach step has a brief discussion section in a cell that describes the step, followed by an exercise in a second cell, where you get to try something out.\nIn some exercises, we'll tell you exactly what to do.\nIn others, we may let you figure it out on your own, because that's a lot more fun than always being told exactly what to do.\n\nBut don't worry. None of these exercises requires you to do anything as difficult as soldering your own CPU from a box of wires ![Smiley face](https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/smileyface.png \"Smiley Face\" ).\nWe're confident you'll be able to work through each of the exercises in a way that will be educational and fun.\n\nLet's get started!\n","12":"markdown","13":{"1":"f30594f0-93ba-4017-8bc3-d3b905995b9f","10":{"9":"<p><center><img src=\" //datastaxtraining.s3.amazonaws.com/developer-day/developer-day-studio-header.png\" width=\"800\"></center></p>\n<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h1><center><span style=\"color:navy\">Data Modeling Intro</span></center></h1>\n<p>Welcome to the intro notebook on data modeling in Cassandra. In this notebook we will learn to:</p>\n<ul>\n<li>Create keyspaces and simple tables</li>\n<li>Insert data into the tables</li>\n<li>Perform simple queries</li>\n</ul>\n<p>This notebook is a step-by-step tutorial.\n<br />Each step has a brief discussion section in a cell that describes the step, followed by an exercise in a second cell, where you get to try something out.\n<br />In some exercises, we'll tell you exactly what to do.\n<br />In others, we may let you figure it out on your own, because that's a lot more fun than always being told exactly what to do.</p>\n<p>But don't worry. None of these exercises requires you to do anything as difficult as soldering your own CPU from a box of wires <img src=\"https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/smileyface.png\" alt=\"Smiley face\" title=\"Smiley Face\" />.\n<br />We're confident you'll be able to work through each of the exercises in a way that will be educational and fun.</p>\n<p>Let's get started!</p>\n"},"11":4,"12":false},"15":5,"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"14e22ecb-f168-44de-a2dd-b042d5c7eb79","10":4,"11":"![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n### <center><span style=\"color:navy\">Step 1: Create a Table</span></center>\nImagine we are creating the killrvideo site and we want to store information about videos in our killrvideo keyspace.\nTo do this, we will need to create tables.\n\nUnlike relational data modeling, we don't create a bunch of normalized general-purpose tables with all kinds of normalized forms - that would be like designing your grandparent's station wagon.\nInstead, we want to think about the queries we will need to respond to, and build high-powered sportscars to get us there.\n\nImagine that for our first query, given an actor, we want to know what movies that actor is in.\nSo, for example, we might want to know what movies Emma Stone has been in.\nWe would search based on a last name of Stone, and a first name of Emma, and want to get back the title of the movie and the year it was produced.\n\n<center>![query](https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/data-modeling/query.png \"query\" )</center>\n\nIt looks like our table is going to need four columns: last_name, first_name, movie_title, movie_year.\nSince we want to search on last_name and first_name, we will make those primary key columns.\nHere's how we might create such a table:\n<center>![create table](https://s3.amazonaws.com/datastaxtraining/developer-day/cassandra-intro/data-modeling/createTableCommand2.png \"create table\" )</center>\n\n\nTry it out at the bottom of the next cell (we threw in an extra line at the top of the cell to make sure the table doesn't exist):","12":"markdown","13":{"1":"076b4efe-5e9e-47ef-9965-56daf74267f7","10":{"9":"<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h3><center><span style=\"color:navy\">Step 1: Create a Table</span></center></h3>\n<p>Imagine we are creating the killrvideo site and we want to store information about videos in our killrvideo keyspace.\n<br />To do this, we will need to create tables.</p>\n<p>Unlike relational data modeling, we don't create a bunch of normalized general-purpose tables with all kinds of normalized forms - that would be like designing your grandparent's station wagon.\n<br />Instead, we want to think about the queries we will need to respond to, and build high-powered sportscars to get us there.</p>\n<p>Imagine that for our first query, given an actor, we want to know what movies that actor is in.\n<br />So, for example, we might want to know what movies Emma Stone has been in.\n<br />We would search based on a last name of Stone, and a first name of Emma, and want to get back the title of the movie and the year it was produced.</p>\n<p><center><img src=\"https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/data-modeling/query.png\" alt=\"query\" title=\"query\" /></center></p>\n<p>It looks like our table is going to need four columns: last_name, first_name, movie_title, movie_year.\n<br />Since we want to search on last_name and first_name, we will make those primary key columns.\n<br />Here's how we might create such a table:\n<br /><center><img src=\"https://s3.amazonaws.com/datastaxtraining/developer-day/cassandra-intro/data-modeling/createTableCommand2.png\" alt=\"create table\" title=\"create table\" /></center></p>\n<p>Try it out at the bottom of the next cell (we threw in an extra line at the top of the cell to make sure the table doesn't exist):</p>\n"},"11":4,"12":false},"15":5,"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"ba3db93c-6d24-4ee2-a079-85d713bb2af4","11":"// The following line will make sure the table does not exist (in case you created it with a previous exercise)\nDROP TABLE IF EXISTS killrvideo.movies_by_actor_name;\n// Now, create the table here:\n","12":"cql","16":true,"17":false,"25":"CL.ONE"},{"1":"17af9c0b-c2c7-4b90-9da7-56d835eed595","10":4,"11":"<details>\n<summary><i><span style=\"color:blue\">Click here for the solution.</span></i></summary>\n```\nCREATE TABLE killrvideo.movies_by_actor_name (\n last_name text,\n first_name text,\n movie_title text,\n movie_year int,\n PRIMARY KEY (last_name, first_name)\n);\n```\n</details>","12":"markdown","13":{"1":"984c8a2a-adad-4cd2-bd66-2cbe74b9328d","10":{"9":"<p><details>\n<br /><summary><i><span style=\"color:blue\">Click here for the solution.</span></i></summary></p>\n<pre><code>CREATE TABLE killrvideo.movies_by_actor_name (\n last_name text,\n first_name text,\n movie_title text,\n movie_year int,\n PRIMARY KEY (last_name, first_name)\n);\n</code></pre>\n<p></details></p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"2b62fe26-efa9-4f52-abdf-6bbedf74e4d9","10":4,"11":"![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n### <center><span style=\"color:navy\">Step 2: Review the table</span></center>\nUsing the schema viewer in the upper right of this Studio window, review the table you just created.","12":"markdown","13":{"1":"bbae630a-4747-4bc6-9eff-bb3d240df11c","10":{"9":"<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h3><center><span style=\"color:navy\">Step 2: Review the table</span></center></h3>\n<p>Using the schema viewer in the upper right of this Studio window, review the table you just created.</p>\n"},"11":4,"12":false},"15":5,"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"172ff0bf-e2b9-4d97-8286-3fd93ea550bc","10":4,"11":"![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n### <center><span style=\"color:navy\">Step 3: Put Data in the Table</span></center>\nOK, now we have a keyspace and a table.\nLet's put some data in the table.\nLet's say we want to create a row for Emma Stone for the 2017 Movie \"Battle of the Sexes\".\nWe can do this by executing the following:\n<center>![insert command](https://s3.amazonaws.com/datastaxtraining/developer-day/cassandra-intro/data-modeling/insertCommand2.png \"insert command\" )</center>\nTry it out:","12":"markdown","13":{"1":"8519fc08-886d-49e5-a340-bb1a9223e577","10":{"9":"<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h3><center><span style=\"color:navy\">Step 3: Put Data in the Table</span></center></h3>\n<p>OK, now we have a keyspace and a table.\n<br />Let's put some data in the table.\n<br />Let's say we want to create a row for Emma Stone for the 2017 Movie “Battle of the Sexes”.\n<br />We can do this by executing the following:\n<br /><center><img src=\"https://s3.amazonaws.com/datastaxtraining/developer-day/cassandra-intro/data-modeling/insertCommand2.png\" alt=\"insert command\" title=\"insert command\" /></center>\n<br />Try it out:</p>\n"},"11":4,"12":false},"15":5,"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"ad9b075f-9baf-41b9-9977-994d8408ec39","11":"// Insert the Emma Stone row here:\n","12":"cql","16":true,"17":false,"25":"CL.ONE"},{"1":"bc648b52-fb17-4a23-9cd1-fbae40f5176e","10":4,"11":"<details>\n<summary><i><span style=\"color:blue\">Click here for the solution.</span></i></summary>\n```\nINSERT INTO killrvideo.movies_by_actor_name (last_name, first_name, movie_title, movie_year)\nVALUES ('Stone', 'Emma', 'Battle of the Sexes', 2017);\n```\n</details>","12":"markdown","13":{"1":"4b8effc8-57b8-4ee6-a224-73d202ec3bf6","10":{"9":"<p><details>\n<br /><summary><i><span style=\"color:blue\">Click here for the solution.</span></i></summary></p>\n<pre><code>INSERT INTO killrvideo.movies_by_actor_name (last_name, first_name, movie_title, movie_year)\nVALUES ('Stone', 'Emma', 'Battle of the Sexes', 2017);\n</code></pre>\n<p></details></p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"ae1af83d-726a-47a9-a821-b69548582fe0","10":4,"11":"![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n### <center><span style=\"color:navy\">Step 4: Retrieve the Data</span></center>\nSo, now that we have inserted a row of data into our table, let's verify the data is in the table.\nExecute the following cell to retrieve all the rows in the table:","12":"markdown","13":{"1":"c56b9e4e-681b-4b9f-9c1e-e6581225f94d","10":{"9":"<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h3><center><span style=\"color:navy\">Step 4: Retrieve the Data</span></center></h3>\n<p>So, now that we have inserted a row of data into our table, let's verify the data is in the table.\n<br />Execute the following cell to retrieve all the rows in the table:</p>\n"},"11":4,"12":false},"15":5,"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"844d9094-2fc9-4f6f-8318-0cce107d007b","11":"// Perform the query here:\nSELECT * FROM killrvideo.movies_by_actor_name;","12":"cql","16":true,"17":false,"18":{},"22":117,"25":"CL.ONE"},{"1":"cba7274e-8102-4531-8768-0da2e0e82c4a","10":4,"11":"![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n### <center><span style=\"color:navy\">Step 5: Retrieve Specific Data Columns</span></center>\nAs you can see, CQL looks a lot like SQL.\nThe designers of CQL made it look like SQL to help those familiar with SQL to get up and running quickly.\nBut beware! There are probably more differences between CQL and SQL than there are similarities.\nWe'll get into these differences in a minute.\n\nBut first, let's see how to retrieve certain columns.\nAs you might expect, it looks a lot like SQL.\nHere's the command to get only the movie title and year:\n\nTry it out in the cell below:","12":"markdown","13":{"1":"f95d952f-f96f-4c8c-ad6f-08343af5f427","10":{"9":"<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h3><center><span style=\"color:navy\">Step 5: Retrieve Specific Data Columns</span></center></h3>\n<p>As you can see, CQL looks a lot like SQL.\n<br />The designers of CQL made it look like SQL to help those familiar with SQL to get up and running quickly.\n<br />But beware! There are probably more differences between CQL and SQL than there are similarities.\n<br />We'll get into these differences in a minute.</p>\n<p>But first, let's see how to retrieve certain columns.\n<br />As you might expect, it looks a lot like SQL.\n<br />Here's the command to get only the movie title and year:</p>\n<p>Try it out in the cell below:</p>\n"},"11":4,"12":false},"15":5,"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"a5b296a7-8b0d-4ad1-bd85-3c2632917778","11":"// Perform the query here:\nSELECT movie_title, movie_year FROM killrvideo.movies_by_actor_name;","12":"cql","16":true,"17":false,"18":{},"22":86,"25":"CL.ONE"},{"1":"38eb3e2c-1a11-439a-972b-311e915b1082","10":4,"11":"![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n### <center><span style=\"color:navy\">Step 6: Add More Data</span></center>\nA database isn't much fun if it only has one row in it.\nLet's put in more data by executing the cell below:\n","12":"markdown","13":{"1":"de42b1d6-8f5d-4e1a-bcfa-65547eaf8beb","10":{"9":"<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h3><center><span style=\"color:navy\">Step 6: Add More Data</span></center></h3>\n<p>A database isn't much fun if it only has one row in it.\n<br />Let's put in more data by executing the cell below:</p>\n"},"11":4,"12":false},"15":5,"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"74b875cf-b0ff-4d7d-88f6-6c8dad2b5756","11":"// Insert another row here:\nINSERT INTO killrvideo.movies_by_actor_name (last_name, first_name, movie_title, movie_year) \nVALUES('Stone', 'Emma', 'La la Land', 2016);","12":"cql","16":true,"17":false,"25":"CL.ONE"},{"1":"33273fc3-b704-4e5d-b889-92f3b8f70e68","10":4,"11":"![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n### <center><span style=\"color:navy\">Step 7: Look at the Data Again</span></center>\nOK, so we threw another row in the table.\nLet's investigate the table again.","12":"markdown","13":{"1":"103bbbe6-583e-4d5a-b5d1-87f7553477b7","10":{"9":"<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h3><center><span style=\"color:navy\">Step 7: Look at the Data Again</span></center></h3>\n<p>OK, so we threw another row in the table.\n<br />Let's investigate the table again.</p>\n"},"11":4,"12":false},"15":5,"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"58485d5a-77ca-4c62-8fcf-fd5420a90e15","11":"// Query all the rows of the table here:\nSELECT * FROM killrvideo.movies_by_actor_name;","12":"cql","16":true,"17":false,"18":{},"22":145,"25":"CL.ONE"},{"1":"8ebcf0be-aae2-461d-814c-dfe78f7df56d","10":4,"11":"Wait a minute!\nWhat just happened?\nYour query only returned one row, but you have inserted two of them.\nThe first row seems to be gone - what happened?\n\nNow, we are starting to see some of the differences between CQL and SQL.\nRemember that our _primary key_ consists of the `last_name` and `first_name` columns.\nThink a moment about how Cassandra works.\n\nRemember that Cassandra stores the row based on the primary key.\nCassandra does _not_ do a read before writing - that would be way too slow and would never scale.\nSo instead, Cassandra just writes the new row to the node _with the same primary key as the first row_, which effectively replaces or updates the row.\nThis is what Cassandra calls an \"upsert\".","12":"markdown","13":{"1":"e5060897-9953-4cef-8ef3-6d2d660ef542","10":{"9":"<p>Wait a minute!\n<br />What just happened?\n<br />Your query only returned one row, but you have inserted two of them.\n<br />The first row seems to be gone - what happened?</p>\n<p>Now, we are starting to see some of the differences between CQL and SQL.\n<br />Remember that our <em>primary key</em> consists of the <code>last_name</code> and <code>first_name</code> columns.\n<br />Think a moment about how Cassandra works.</p>\n<p>Remember that Cassandra stores the row based on the primary key.\n<br />Cassandra does <em>not</em> do a read before writing - that would be way too slow and would never scale.\n<br />So instead, Cassandra just writes the new row to the node <em>with the same primary key as the first row</em>, which effectively replaces or updates the row.\n<br />This is what Cassandra calls an “upsert”.</p>\n"},"11":4,"12":false},"15":5,"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"0577530f-5fb0-40fd-9ee7-696e59800433","10":4,"11":"![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n### <center><span style=\"color:navy\">Step 8: Delete the Table</span></center>\nIf we want to associate several movies with Emma Stone, we will need to adjust our data model so that each movie entry has it's own primary key.\nOnce each movie has a unique primary key, we can insert new movie entries for the same actor without stepping on the previous movie entries for that actor.\n\nTo make the necessary adjustment to the table, it's probably simplest at this point to delete the table and re-create it with a different primary key.\nExecute the following cell with DROP to delete the table:","12":"markdown","13":{"1":"755b9662-0b8d-4545-af2c-0533bb7c97ff","10":{"9":"<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h3><center><span style=\"color:navy\">Step 8: Delete the Table</span></center></h3>\n<p>If we want to associate several movies with Emma Stone, we will need to adjust our data model so that each movie entry has it's own primary key.\n<br />Once each movie has a unique primary key, we can insert new movie entries for the same actor without stepping on the previous movie entries for that actor.</p>\n<p>To make the necessary adjustment to the table, it's probably simplest at this point to delete the table and re-create it with a different primary key.\n<br />Execute the following cell with DROP to delete the table:</p>\n"},"11":4,"12":false},"15":5,"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"63e301e9-14e3-4921-811c-04e9b01da983","11":"// Delete the table here:\nDROP TABLE killrvideo.movies_by_actor_name;","12":"cql","16":true,"17":false,"25":"CL.ONE"},{"1":"cc786824-97f3-4297-a0fa-5c8918b5c442","10":4,"11":"![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n### <center><span style=\"color:navy\">Step 9: Adjust the Model</span></center>\nNow, we can re-create the table with the right kind of primary key.\nWe want to have one partition per actor, but many movie entries in that partition.\nWe do this by adjusting the primary key to create _uniqueness_.\n\nPreviously, the primary key for the table consisted of a partition key (last_name), and a single clustering column (first_name).\nUsing this schema, Cassandra stored all actors with the same last name in the same parition, but each unique first name in a separate row.\nWe want a row for each last_name, first name _and movie title_, so we will add the `movie_title` column as a clustering column within the primary key.\n\nWhen we query this table, we don't have to supply the movie title as part of the query, it is only there to prevent upserts when we add multiple movies per actor.\nAlso, since every query will use both the `last_name` and `first_name` columns, we will make both of these columns part of the partition key.\nBy using both columns as the partition key, we cause Cassandra to store one partition per actor, which means Cassandra will do a single physical access to satisfy our query - making the query as fast as possible.\n\nHere's how we do it:\n<center>![create clustering table](https://s3.amazonaws.com/datastaxtraining/developer-day/cassandra-intro/data-modeling/createClusteringTable2.png \"create clustering table\" )</center>\nThis `CREATE` statement looks a lot like the first, but notice that the `PRIMARY KEY` section is different.\nWe have done two things:\n* First, we added `movie_title` as a third field in the primary key\n* Second, we added parentheses around the `last_name` and `first_name` fields.\nThese extra parentheses may look extraneous, but they are necessary.\nThey designate the fields that constitute the partition key portion of the primary key.\nRemember, the partition key portion of the primary key is the only part of the primary key that Cassandra hashes to determine the partition token.\n\nExecute this command to create the table, insert both of the movie titles for Emma Stone and finally select all the rows associated with Emma Stone (you can do all this in multiple lines of a single cell by separating each command with a semicolon).","12":"markdown","13":{"1":"c968cdcf-542c-4e0e-8168-1cc6e2e6240c","10":{"9":"<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h3><center><span style=\"color:navy\">Step 9: Adjust the Model</span></center></h3>\n<p>Now, we can re-create the table with the right kind of primary key.\n<br />We want to have one partition per actor, but many movie entries in that partition.\n<br />We do this by adjusting the primary key to create <em>uniqueness</em>.</p>\n<p>Previously, the primary key for the table consisted of a partition key (last_name), and a single clustering column (first_name).\n<br />Using this schema, Cassandra stored all actors with the same last name in the same parition, but each unique first name in a separate row.\n<br />We want a row for each last_name, first name <em>and movie title</em>, so we will add the <code>movie_title</code> column as a clustering column within the primary key.</p>\n<p>When we query this table, we don't have to supply the movie title as part of the query, it is only there to prevent upserts when we add multiple movies per actor.\n<br />Also, since every query will use both the <code>last_name</code> and <code>first_name</code> columns, we will make both of these columns part of the partition key.\n<br />By using both columns as the partition key, we cause Cassandra to store one partition per actor, which means Cassandra will do a single physical access to satisfy our query - making the query as fast as possible.</p>\n<p>Here's how we do it:\n<br /><center><img src=\"https://s3.amazonaws.com/datastaxtraining/developer-day/cassandra-intro/data-modeling/createClusteringTable2.png\" alt=\"create clustering table\" title=\"create clustering table\" /></center>\n<br />This <code>CREATE</code> statement looks a lot like the first, but notice that the <code>PRIMARY KEY</code> section is different.\n<br />We have done two things:</p>\n<ul>\n<li>First, we added <code>movie_title</code> as a third field in the primary key</li>\n<li>Second, we added parentheses around the <code>last_name</code> and <code>first_name</code> fields.\n<br />These extra parentheses may look extraneous, but they are necessary.\n<br />They designate the fields that constitute the partition key portion of the primary key.\n<br />Remember, the partition key portion of the primary key is the only part of the primary key that Cassandra hashes to determine the partition token.</li>\n</ul>\n<p>Execute this command to create the table, insert both of the movie titles for Emma Stone and finally select all the rows associated with Emma Stone (you can do all this in multiple lines of a single cell by separating each command with a semicolon).</p>\n"},"11":4,"12":false},"15":5,"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"64f89f30-aa47-4481-8ff0-addffac0a91a","11":"// Create the table here (as shown in the image above):\nCREATE TABLE killrvideo.movies_by_actor_name (\n last_name text,\n first_name text,\n movie_title text,\n movie_year int,\n PRIMARY KEY ((last_name, first_name), movie_title)\n);","12":"cql","14":"java.util.concurrent.ExecutionException: com.datastax.studio.common.errors.ExecutionExceptionWithLineNumber: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.AlreadyExistsException: Table killrvideo.movies_by_actor_name already exists","16":true,"17":false,"25":"CL.ONE"},{"1":"a44d6105-be23-4467-8e2f-93406b24efa7","10":4,"11":"Notice the following insert statements are the EXACT SAME insert statements we executed previously.\nThe only difference now is our primary key, yet the outcome will be slightly different. Why might this be?","12":"markdown","13":{"1":"cdadaf0d-2866-42ed-b693-62231a35664d","10":{"9":"<p>Notice the following insert statements are the EXACT SAME insert statements we executed previously.\n<br />The only difference now is our primary key, yet the outcome will be slightly different. Why might this be?</p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"ff072aa3-6bc5-4b8e-b996-c726bafbabb6","11":"// Insert the row for \"Battle of the sexes\":\nINSERT INTO killrvideo.movies_by_actor_name (last_name, first_name, movie_title, movie_year)\nVALUES ('Stone', 'Emma', 'Battle of the Sexes', 2017);\n\n// Insert the row for \"La la land\":\nINSERT INTO killrvideo.movies_by_actor_name (last_name, first_name, movie_title, movie_year) \nVALUES('Stone', 'Emma', 'La la Land', 2016);\n\n// Select all rows of the table here:\nSELECT * FROM killrvideo.movies_by_actor_name;","12":"cql","16":true,"17":false,"22":16,"25":"CL.ONE"},{"1":"c162a4e5-8832-4060-89da-bb6fc7c58a44","10":4,"11":"![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n### <center><span style=\"color:navy\">Step 10: Add More Data</span></center>\nLet's add some rows for more actors as shown in the table:\n<center>![movie table](https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/data-modeling/movieTable.png \"movie table\" )</center>\nTo make it easy, since you already know how to add rows, we have already prepared the `INSERT` statements.\nYou just need to execute them in the cell below:","12":"markdown","13":{"1":"41b2a5f3-798f-43b7-a193-80c502cfad53","10":{"9":"<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h3><center><span style=\"color:navy\">Step 10: Add More Data</span></center></h3>\n<p>Let's add some rows for more actors as shown in the table:\n<br /><center><img src=\"https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/data-modeling/movieTable.png\" alt=\"movie table\" title=\"movie table\" /></center>\n<br />To make it easy, since you already know how to add rows, we have already prepared the <code>INSERT</code> statements.\n<br />You just need to execute them in the cell below:</p>\n"},"11":4,"12":false},"15":5,"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"1c0e2b02-9c48-455f-9046-772cedbfcad1","11":"// Insert these rows by executing this cell:\nINSERT INTO killrvideo.movies_by_actor_name (last_name, first_name, movie_title, movie_year) VALUES('Depp', 'Johnny', 'Lone Ranger', 2013);\nINSERT INTO killrvideo.movies_by_actor_name (last_name, first_name, movie_title, movie_year) VALUES('Depp', 'Johnny', 'Tourist', 2010);\nINSERT INTO killrvideo.movies_by_actor_name (last_name, first_name, movie_title, movie_year) VALUES('Jolie', 'Angelina', 'Tourist', 2010);\nINSERT INTO killrvideo.movies_by_actor_name (last_name, first_name, movie_title, movie_year) VALUES('Jolie', 'Angelina', 'By the Sea', 2015);\nINSERT INTO killrvideo.movies_by_actor_name (last_name, first_name, movie_title, movie_year) VALUES('Pitt', 'Brad', 'Big Short', 2015);\nINSERT INTO killrvideo.movies_by_actor_name (last_name, first_name, movie_title, movie_year) VALUES('Pitt', 'Brad', 'Moneyball', 2011);","12":"cql","16":true,"17":false,"25":"CL.ONE"},{"1":"0d1e53b1-94b4-4650-8d5d-fe49d3e42d2e","10":4,"11":"![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n### <center><span style=\"color:navy\">Step 11: Make Sure the Data is There</span></center>\nDo a simple select of all the data in the table to verify the rows look like you think they should:\n","12":"markdown","13":{"1":"f60bfbb6-0433-4cc2-abbe-dc7d1230c35b","10":{"9":"<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h3><center><span style=\"color:navy\">Step 11: Make Sure the Data is There</span></center></h3>\n<p>Do a simple select of all the data in the table to verify the rows look like you think they should:</p>\n"},"11":4,"12":false},"15":5,"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"c3bd21bc-1e2e-4e32-ac74-8e48d0a28064","11":"// Query the table here:\nSELECT * FROM killrvideo.movies_by_actor_name;","12":"cql","16":true,"17":false,"25":"CL.ONE"},{"1":"56c4168d-ab75-45ea-9b5d-b4aed2bbcd8e","10":4,"11":"![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n### <center><span style=\"color:navy\">Step 12: Finding the Data</span></center>\nClearly, we don't want the entire table back from a query - only the rows associated with a specific actor.\n(In fact, you should never execute a query that selects the entire table on a production keyspace as this would cause serious performance problems.\nWe do this in these exercises only for illustrative purposes.)\n\nAs you might expect, the CQL query for this looks a lot like an SQL query.\n\nExecute the query in the following cell:","12":"markdown","13":{"1":"64c1995d-403d-4350-834d-0fbfd7c2d20b","10":{"9":"<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h3><center><span style=\"color:navy\">Step 12: Finding the Data</span></center></h3>\n<p>Clearly, we don't want the entire table back from a query - only the rows associated with a specific actor.\n<br />(In fact, you should never execute a query that selects the entire table on a production keyspace as this would cause serious performance problems.\n<br />We do this in these exercises only for illustrative purposes.)</p>\n<p>As you might expect, the CQL query for this looks a lot like an SQL query.</p>\n<p>Execute the query in the following cell:</p>\n"},"11":4,"12":false},"15":5,"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"45f5ec7c-1d97-4977-a1f2-6c4a29f08a7c","11":"// Execute the query for just Emma Stone here:\nSELECT * FROM killrvideo.movies_by_actor_name \n WHERE last_name = 'Stone' \n AND first_name = 'Emma';","12":"cql","16":true,"17":false,"25":"CL.ONE"},{"1":"8995dabc-52f5-498f-a3e2-9682c2a47282","10":4,"11":"That query probably worked pretty much like you expected.\nYou see the two rows for the movies of Emma Stone.\n\nNote that these two rows were in the same parition.\nThis means that Cassandra only had to do one physical access to service your query - which means it was fast!","12":"markdown","13":{"1":"b1e826e7-4f87-49f4-ae11-c1ebff1caa7b","10":{"9":"<p>That query probably worked pretty much like you expected.\n<br />You see the two rows for the movies of Emma Stone.</p>\n<p>Note that these two rows were in the same parition.\n<br />This means that Cassandra only had to do one physical access to service your query - which means it was fast!</p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"609fd1fd-f039-4246-b527-d1b5b308de62","10":4,"11":"![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n### <center><span style=\"color:navy\">Step 13: Problems with Finding the Data</span></center>\nIt may have occurred to you that since we only have one actor with the last name Stone, we could query the table only specifying the last name.\nLet's give it a try.\nExecute the command in the following cell:","12":"markdown","13":{"1":"9e222054-a115-4fe1-a1bb-45462ec7e4a8","10":{"9":"<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h3><center><span style=\"color:navy\">Step 13: Problems with Finding the Data</span></center></h3>\n<p>It may have occurred to you that since we only have one actor with the last name Stone, we could query the table only specifying the last name.\n<br />Let's give it a try.\n<br />Execute the command in the following cell:</p>\n"},"11":4,"12":false},"15":5,"16":true,"17":true,"18":{},"22":154,"25":"CL.ONE"},{"1":"a352baa5-e630-425a-a785-50c93604cb3c","11":"// Try the query for the actor with just the last_name = Stone:\nSELECT * FROM killrvideo.movies_by_actor_name where last_name = 'Stone';","12":"cql","14":"java.util.concurrent.ExecutionException: com.datastax.studio.common.errors.ExecutionExceptionWithLineNumber: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.InvalidQueryException: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING","16":true,"17":false,"25":"CL.ONE"},{"1":"ec58383e-25c4-4077-9b84-59daab18900e","10":4,"11":"You see that Cassandra doesn't like this command! Why not?\nRemember that the _partition key_ consists of both the `last_name` and `first_name` fields.\nCassandra hashes both of these values together to determine the partition number.\nSo, Cassandra cannot calculate the partition number by only one of these values.\n\nYou might be thinking, \"But my relational database can do that...\".\nWell, that's true, but in today's world, your relational database also is not capable of scaling to the level you need to.\nThis apparent limitation of Cassandra is not a problem when you make the switch in thinking: remember, start with the query!\nIf you really want to query merely by last name, you would specify `last_name` as the partition key and use `first_name` as a clustering column.\n\nWhat would using only the last name as the partition key cause Cassandra to do?\nCassandra would store all of the actors with the same last names, and their associated movies within the same partition.\nIf the `first_name` field were a clustering column, you would still be able to query using the first name in concert with the last name.\nBut remember, back in the beginning of this notebook, we used `last_name` as the partition key column and `first_name` as a clustering column.\nThis primary key configuration caused upserts when we tried to add multiple movies for a single actor.\nTo avoid the upserts, we could add `movie_title` as a second clustering column.\nWe would not search on the `movie_title` column, but would use this column to make the primary key unique for the row - thus avoiding the upsert.\n\nThere is a bit of a trade-off here to consider when data modeling -\nYou can make your queries more general by combining more rows into the same partition, but if your partitions get very large, you will suffer a performance penalty.\nRemember, a partition is Cassandra's single unit of physical access.\nSo the least amount of data Cassandra can pull from the disk is a partition.\nIf your data model forces Cassandra to haul a huge partition off of the disk, only to grab one small row from the partition, your queries will not scale well.\nOn the other hand, if you can build your tables so that each query retrieves and uses all the data in a single partition, your queries will be fast.\n\nBottom line: generally try to shoot for a single partition per query.","12":"markdown","13":{"1":"739b2b91-5187-47d8-b4b5-8c4bc4ce4f2d","10":{"9":"<p>You see that Cassandra doesn't like this command! Why not?\n<br />Remember that the <em>partition key</em> consists of both the <code>last_name</code> and <code>first_name</code> fields.\n<br />Cassandra hashes both of these values together to determine the partition number.\n<br />So, Cassandra cannot calculate the partition number by only one of these values.</p>\n<p>You might be thinking, “But my relational database can do that…“.\n<br />Well, that's true, but in today's world, your relational database also is not capable of scaling to the level you need to.\n<br />This apparent limitation of Cassandra is not a problem when you make the switch in thinking: remember, start with the query!\n<br />If you really want to query merely by last name, you would specify <code>last_name</code> as the partition key and use <code>first_name</code> as a clustering column.</p>\n<p>What would using only the last name as the partition key cause Cassandra to do?\n<br />Cassandra would store all of the actors with the same last names, and their associated movies within the same partition.\n<br />If the <code>first_name</code> field were a clustering column, you would still be able to query using the first name in concert with the last name.\n<br />But remember, back in the beginning of this notebook, we used <code>last_name</code> as the partition key column and <code>first_name</code> as a clustering column.\n<br />This primary key configuration caused upserts when we tried to add multiple movies for a single actor.\n<br />To avoid the upserts, we could add <code>movie_title</code> as a second clustering column.\n<br />We would not search on the <code>movie_title</code> column, but would use this column to make the primary key unique for the row - thus avoiding the upsert.</p>\n<p>There is a bit of a trade-off here to consider when data modeling -\n<br />You can make your queries more general by combining more rows into the same partition, but if your partitions get very large, you will suffer a performance penalty.\n<br />Remember, a partition is Cassandra's single unit of physical access.\n<br />So the least amount of data Cassandra can pull from the disk is a partition.\n<br />If your data model forces Cassandra to haul a huge partition off of the disk, only to grab one small row from the partition, your queries will not scale well.\n<br />On the other hand, if you can build your tables so that each query retrieves and uses all the data in a single partition, your queries will be fast.</p>\n<p>Bottom line: generally try to shoot for a single partition per query.</p>\n"},"11":4,"12":false},"15":5,"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"0992fcf8-c617-4d48-87b5-a2f62b37ef78","10":4,"11":"![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n### <center><span style=\"color:navy\">Step 14: Denormalization</span></center>\nIf you were born and raised in the relational world, you were taught from an early age that normalization \n(meaning, only storing a data item one time) is the _only_ way to represent data.\nThe reasons for normalization are twofold: first, if you need to update the data, and you have it stored in multiple places, \nyou will need to know where all the copies of the data are.\nSecond, if you have multiple copies of the data, there is a possibility that the copies will be inconsistent.\n\nWhile these arguments are true, normalization comes at a huge cost.\nNormalized databases just can't keep up with the demands of today's users' requirements.\nSo while normalization might be nice, we need to get beyond the normalized mindset if we are going to satisfy our users.\n\nFor example, imagine we want to query our killrvideo database using a query based on `movie_year`.\nSince `movie_year` is not the partition key (in fact, it's not even part of the primary key), \nwe can't perform that query on our existing table.\nWhat's a data modeler to do?\n\nCreate another table!\nRemember, we build tables to satisfy queries.\nSo, if we have another query, we probably need another table.\n\nThink about this for a minute - what should this table look like?\nWe want to query based on the movie year, so the movie year needs to be the partition key.\nBut we also want uniqueness for each movie so we don't get upserts.\nThe uniqueness requirement means we will need a clustering column that will be unique for each movie - movie title will work for that.\n\nFor this query, we really don't care about the actors, so we'll drop the actor related fields.\nSee if you can fill out the primary key portion of the `CREATE TABLE` statement below:","12":"markdown","13":{"1":"008a09ab-b8ff-4699-8284-ecdf9624160f","10":{"9":"<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h3><center><span style=\"color:navy\">Step 14: Denormalization</span></center></h3>\n<p>If you were born and raised in the relational world, you were taught from an early age that normalization\n<br />(meaning, only storing a data item one time) is the <em>only</em> way to represent data.\n<br />The reasons for normalization are twofold: first, if you need to update the data, and you have it stored in multiple places,\n<br />you will need to know where all the copies of the data are.\n<br />Second, if you have multiple copies of the data, there is a possibility that the copies will be inconsistent.</p>\n<p>While these arguments are true, normalization comes at a huge cost.\n<br />Normalized databases just can't keep up with the demands of today's users' requirements.\n<br />So while normalization might be nice, we need to get beyond the normalized mindset if we are going to satisfy our users.</p>\n<p>For example, imagine we want to query our killrvideo database using a query based on <code>movie_year</code>.\n<br />Since <code>movie_year</code> is not the partition key (in fact, it's not even part of the primary key),\n<br />we can't perform that query on our existing table.\n<br />What's a data modeler to do?</p>\n<p>Create another table!\n<br />Remember, we build tables to satisfy queries.\n<br />So, if we have another query, we probably need another table.</p>\n<p>Think about this for a minute - what should this table look like?\n<br />We want to query based on the movie year, so the movie year needs to be the partition key.\n<br />But we also want uniqueness for each movie so we don't get upserts.\n<br />The uniqueness requirement means we will need a clustering column that will be unique for each movie - movie title will work for that.</p>\n<p>For this query, we really don't care about the actors, so we'll drop the actor related fields.\n<br />See if you can fill out the primary key portion of the <code>CREATE TABLE</code> statement below:</p>\n"},"11":4,"12":false},"15":5,"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"441f1fc8-81a8-49c9-994f-2ae30feed13d","11":"// The following line will make sure the table does not exist (in case you created it with a previous exercise)\nDROP TABLE killrvideo.movies_by_year;\n// Complete this statement by filling in the primary key fields\n// Remember, you want the movie year as the partition key and the movie title as the clustering column\nCREATE TABLE killrvideo.movies_by_year (\n movie_year int,\n movie_title text,\n PRIMARY KEY ((), )\n);","12":"cql","16":true,"17":false,"25":"CL.ONE"},{"1":"d18d11d1-b2fe-409b-b245-869d78ae9ba7","10":4,"11":"<details>\n<summary><i><span style=\"color:blue\">Click here for the solution.</span></i></summary>\n```\nCREATE TABLE killrvideo.movies_by_year (\n movie_year int,\n movie_title text,\n PRIMARY KEY ((movie_year), movie_title)\n);\n```\n</details>","12":"markdown","13":{"1":"c261ba3f-1c20-4718-8448-abe9d7b013cb","10":{"9":"<p><details>\n<br /><summary><i><span style=\"color:blue\">Click here for the solution.</span></i></summary></p>\n<pre><code>CREATE TABLE killrvideo.movies_by_year (\n movie_year int,\n movie_title text,\n PRIMARY KEY ((movie_year), movie_title)\n);\n</code></pre>\n<p></details></p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"95c3c3e6-e1f9-4ae1-9e26-bc01e4b902e8","11":"// We'll also throw some rows in the table - you don't need to change any of the lines below this comment\nINSERT INTO killrvideo.movies_by_year (movie_year, movie_title) VALUES(2013, 'Lone Ranger');\nINSERT INTO killrvideo.movies_by_year (movie_year, movie_title) VALUES(2010, 'Tourist');\nINSERT INTO killrvideo.movies_by_year (movie_year, movie_title) VALUES(2010, 'Inception');\nINSERT INTO killrvideo.movies_by_year (movie_year, movie_title) VALUES(2015, 'By the Sea');\nINSERT INTO killrvideo.movies_by_year (movie_year, movie_title) VALUES(2015, 'Big Short');\nINSERT INTO killrvideo.movies_by_year (movie_year, movie_title) VALUES(2011, 'Moneyball');\nINSERT INTO killrvideo.movies_by_year (movie_year, movie_title) VALUES(2011, 'Hugo');\nINSERT INTO killrvideo.movies_by_year (movie_year, movie_title) VALUES(2012, 'Skyfall');\nINSERT INTO killrvideo.movies_by_year (movie_year, movie_title) VALUES(2011, 'Interstellar');","12":"cql","16":true,"17":false,"25":"CL.ONE"},{"1":"742b46f3-ca4b-4384-8d6b-50c7673e5b9d","10":4,"11":"![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n### <center><span style=\"color:navy\">Step 15: Check Out the New Table</span></center>\nLet's run a query on this table.\nSee if you can figure out how to get all the movie released in 2011:","12":"markdown","13":{"1":"7ba5c9f6-66e5-4346-bc50-c8424a3d4b86","10":{"9":"<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h3><center><span style=\"color:navy\">Step 15: Check Out the New Table</span></center></h3>\n<p>Let's run a query on this table.\n<br />See if you can figure out how to get all the movie released in 2011:</p>\n"},"11":4,"12":false},"15":5,"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"f27fb778-c072-4833-99ff-74eaa36fb1e3","11":"// Write your query here (if you are having trouble, follow the general form of the query from step 13):\n","12":"cql","16":true,"17":false,"25":"CL.ONE"},{"1":"f370f74e-68f6-4d8f-aba2-f466d8b678af","10":4,"11":"<details>\n<summary><i><span style=\"color:blue\">Click here for the solution.</span></i></summary>\n```\nSELECT * FROM killrvideo.movies_by_year where movie_year = 2011;\n```\n</details>","12":"markdown","13":{"1":"9ca72422-ea46-499e-bda5-ec8760989ee9","10":{"9":"<p><details>\n<br /><summary><i><span style=\"color:blue\">Click here for the solution.</span></i></summary></p>\n<pre><code>SELECT * FROM killrvideo.movies_by_year where movie_year = 2011;\n</code></pre>\n<p></details></p>\n"},"11":4,"12":false},"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"944e5096-5c5f-45f7-9f8c-95319763d057","10":4,"11":"![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n### <center><span style=\"color:navy\">Step 16: Ordering Results</span></center>\nOK, one last thing we may want to discuss -\nSuppose you want to order the results of your query.\nIf you look at the results returned by your last query, you will notice the results are in alphabetical order of the movie title.\nThis is not by chance.\n\nCassandra orders query results based on the clustering column.\nHowever, you may reverse the order by using the `ORDER BY` clause.\nThe default order is ascending (`ASC`), but you can change it to descending (`DESC`).\n\nRun the following query and inspect the results:","12":"markdown","13":{"1":"3fd3c9cf-b639-49ce-8a3c-04932d09ee71","10":{"9":"<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h3><center><span style=\"color:navy\">Step 16: Ordering Results</span></center></h3>\n<p>OK, one last thing we may want to discuss -\n<br />Suppose you want to order the results of your query.\n<br />If you look at the results returned by your last query, you will notice the results are in alphabetical order of the movie title.\n<br />This is not by chance.</p>\n<p>Cassandra orders query results based on the clustering column.\n<br />However, you may reverse the order by using the <code>ORDER BY</code> clause.\n<br />The default order is ascending (<code>ASC</code>), but you can change it to descending (<code>DESC</code>).</p>\n<p>Run the following query and inspect the results:</p>\n"},"11":4,"12":false},"15":5,"16":true,"17":true,"18":{},"25":"CL.ONE"},{"1":"e301585f-ea85-47b3-8be4-ff343aa12b3b","11":"SELECT * FROM killrvideo.movies_by_year\n WHERE movie_year = 2011\n ORDER BY movie_title DESC;","12":"cql","16":true,"17":false,"25":"CL.ONE"},{"1":"01d32707-27ce-4d71-affb-fe2c4f20d7e3","10":4,"11":"![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n# <center><span style=\"color:navy\">Congratulations!!!!!</span></center>\n#### You have completed the _Intro to Data Modeling_ notebook!\n\nNow you know the main concepts you need to be an Apache Cassandra™ data modeler.\nClearly, there are few more details we were not able to cover in this short notebook, but you have the main basics down.\n\nLet's review what you have learned:\n* How to create a keyspace using CQL\n* How to create a basic table using CQL\n* How to insert data using CQL\n* How to drop a table using CQL\n* How to form a primary key\n* What upserts are\n* How to use clustering columns for uniqueness\n* How to denormalize your data to create blazing fast queries\n* How to use clustering columns to order results\n\nWOW! That's quite a lot!\n\nAt this point, it should be clear that working with Apache Cassandra™ isn't scary, and\nyou should feel confident and satisifed that you understand fundamental Cassandra data modeling!\n\n\n![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png \"line\" )\n# <center><span style=\"color:navy\">Want to Know More?</span></center>\n#### Check out these great <span style=\"color:green\">FREE</span> classes on DataStax Academy:\n* [DS201: DataStax Enterprise Foundations of Apache Cassandra™](http://academy.datastax.com/resources/ds201-foundations-apache-cassandra \"DS201: DataStax Enterprise Foundations of Apache Cassandra™\")\n* [DS210: DataStax Enterprise Operations with Apache Cassandra™](https://academy.datastax.com/resources/ds210-datastax-enterprise-operations-apache-cassandra \"DS210: DataStax Enterprise Operations with Apache Cassandra™\")\n* [DS220: DataStax Enterprise 6 Practical Application Data Modeling with Apache Cassandra™](https://academy.datastax.com/resources/ds220 \"DS220: DataStax Enterprise 6 Practical Application Data Modeling with Apache Cassandra™\")\n","12":"markdown","13":{"1":"96a5fdfc-a676-41e2-9ffb-04082a573be3","10":{"9":"<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h1><center><span style=\"color:navy\">Congratulations!!!!!</span></center></h1>\n<h4>You have completed the <em>Intro to Data Modeling</em> notebook!</h4>\n<p>Now you know the main concepts you need to be an Apache Cassandra™ data modeler.\n<br />Clearly, there are few more details we were not able to cover in this short notebook, but you have the main basics down.</p>\n<p>Let's review what you have learned:</p>\n<ul>\n<li>How to create a keyspace using CQL</li>\n<li>How to create a basic table using CQL</li>\n<li>How to insert data using CQL</li>\n<li>How to drop a table using CQL</li>\n<li>How to form a primary key</li>\n<li>What upserts are</li>\n<li>How to use clustering columns for uniqueness</li>\n<li>How to denormalize your data to create blazing fast queries</li>\n<li>How to use clustering columns to order results</li>\n</ul>\n<p>WOW! That's quite a lot!</p>\n<p>At this point, it should be clear that working with Apache Cassandra™ isn't scary, and\n<br />you should feel confident and satisifed that you understand fundamental Cassandra data modeling!</p>\n<p><img src=\"https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png\" alt=\"line\" title=\"line\" /></p>\n<h1><center><span style=\"color:navy\">Want to Know More?</span></center></h1>\n<h4>Check out these great <span style=\"color:green\">FREE</span> classes on DataStax Academy:</h4>\n<ul>\n<li><a href=\"http://academy.datastax.com/resources/ds201-foundations-apache-cassandra\" title=\"DS201: DataStax Enterprise Foundations of Apache Cassandra™\">DS201: DataStax Enterprise Foundations of Apache Cassandra™</a></li>\n<li><a href=\"https://academy.datastax.com/resources/ds210-datastax-enterprise-operations-apache-cassandra\" title=\"DS210: DataStax Enterprise Operations with Apache Cassandra™\">DS210: DataStax Enterprise Operations with Apache Cassandra™</a></li>\n<li><a href=\"https://academy.datastax.com/resources/ds220\" title=\"DS220: DataStax Enterprise 6 Practical Application Data Modeling with Apache Cassandra™\">DS220: DataStax Enterprise 6 Practical Application Data Modeling with Apache Cassandra™</a></li>\n</ul>\n"},"11":4,"12":false},"15":5,"16":true,"17":true,"18":{},"25":"CL.ONE"}],"16":{"1":{}},"17":""} code.txt 0100644 0000000 0000000 00000062544 13563010763 011277 0 ustar 00 0000000 0000000 --------------------NOTEBOOK_02-01 - Data Modeling: Data Modeling Intro--------------------
--------------------CELL_MARKDOWN_1--------------------
<center><img src="//datastaxtraining.s3.amazonaws.com/developer-day/developer-day-studio-header.png" width="800"></center>
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
# <center><span style="color:navy">Data Modeling Intro</span></center>
Welcome to the intro notebook on data modeling in Cassandra. In this notebook we will learn to:
* Create keyspaces and simple tables
* Insert data into the tables
* Perform simple queries
This notebook is a step-by-step tutorial.
Each step has a brief discussion section in a cell that describes the step, followed by an exercise in a second cell, where you get to try something out.
In some exercises, we'll tell you exactly what to do.
In others, we may let you figure it out on your own, because that's a lot more fun than always being told exactly what to do.
But don't worry. None of these exercises requires you to do anything as difficult as soldering your own CPU from a box of wires ![Smiley face](https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/smileyface.png "Smiley Face" ).
We're confident you'll be able to work through each of the exercises in a way that will be educational and fun.
Let's get started!
--------------------CELL_MARKDOWN_2--------------------
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
### <center><span style="color:navy">Step 1: Create a Table</span></center>
Imagine we are creating the killrvideo site and we want to store information about videos in our killrvideo keyspace.
To do this, we will need to create tables.
Unlike relational data modeling, we don't create a bunch of normalized general-purpose tables with all kinds of normalized forms - that would be like designing your grandparent's station wagon.
Instead, we want to think about the queries we will need to respond to, and build high-powered sportscars to get us there.
Imagine that for our first query, given an actor, we want to know what movies that actor is in.
So, for example, we might want to know what movies Emma Stone has been in.
We would search based on a last name of Stone, and a first name of Emma, and want to get back the title of the movie and the year it was produced.
<center>![query](https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/data-modeling/query.png "query" )</center>
It looks like our table is going to need four columns: last_name, first_name, movie_title, movie_year.
Since we want to search on last_name and first_name, we will make those primary key columns.
Here's how we might create such a table:
<center>![create table](https://s3.amazonaws.com/datastaxtraining/developer-day/cassandra-intro/data-modeling/createTableCommand2.png "create table" )</center>
Try it out at the bottom of the next cell (we threw in an extra line at the top of the cell to make sure the table doesn't exist):
--------------------CELL_CQL_3--------------------
// The following line will make sure the table does not exist (in case you created it with a previous exercise)
DROP TABLE IF EXISTS killrvideo.movies_by_actor_name;
// Now, create the table here:
--------------------CELL_MARKDOWN_4--------------------
<details>
<summary><i><span style="color:blue">Click here for the solution.</span></i></summary>
```
CREATE TABLE killrvideo.movies_by_actor_name (
last_name text,
first_name text,
movie_title text,
movie_year int,
PRIMARY KEY (last_name, first_name)
);
```
</details>
--------------------CELL_MARKDOWN_5--------------------
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
### <center><span style="color:navy">Step 2: Review the table</span></center>
Using the schema viewer in the upper right of this Studio window, review the table you just created.
--------------------CELL_MARKDOWN_6--------------------
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
### <center><span style="color:navy">Step 3: Put Data in the Table</span></center>
OK, now we have a keyspace and a table.
Let's put some data in the table.
Let's say we want to create a row for Emma Stone for the 2017 Movie "Battle of the Sexes".
We can do this by executing the following:
<center>![insert command](https://s3.amazonaws.com/datastaxtraining/developer-day/cassandra-intro/data-modeling/insertCommand2.png "insert command" )</center>
Try it out:
--------------------CELL_CQL_7--------------------
// Insert the Emma Stone row here:
--------------------CELL_MARKDOWN_8--------------------
<details>
<summary><i><span style="color:blue">Click here for the solution.</span></i></summary>
```
INSERT INTO killrvideo.movies_by_actor_name (last_name, first_name, movie_title, movie_year)
VALUES ('Stone', 'Emma', 'Battle of the Sexes', 2017);
```
</details>
--------------------CELL_MARKDOWN_9--------------------
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
### <center><span style="color:navy">Step 4: Retrieve the Data</span></center>
So, now that we have inserted a row of data into our table, let's verify the data is in the table.
Execute the following cell to retrieve all the rows in the table:
--------------------CELL_CQL_10--------------------
// Perform the query here:
SELECT * FROM killrvideo.movies_by_actor_name;
--------------------CELL_MARKDOWN_11--------------------
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
### <center><span style="color:navy">Step 5: Retrieve Specific Data Columns</span></center>
As you can see, CQL looks a lot like SQL.
The designers of CQL made it look like SQL to help those familiar with SQL to get up and running quickly.
But beware! There are probably more differences between CQL and SQL than there are similarities.
We'll get into these differences in a minute.
But first, let's see how to retrieve certain columns.
As you might expect, it looks a lot like SQL.
Here's the command to get only the movie title and year:
Try it out in the cell below:
--------------------CELL_CQL_12--------------------
// Perform the query here:
SELECT movie_title, movie_year FROM killrvideo.movies_by_actor_name;
--------------------CELL_MARKDOWN_13--------------------
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
### <center><span style="color:navy">Step 6: Add More Data</span></center>
A database isn't much fun if it only has one row in it.
Let's put in more data by executing the cell below:
--------------------CELL_CQL_14--------------------
// Insert another row here:
INSERT INTO killrvideo.movies_by_actor_name (last_name, first_name, movie_title, movie_year)
VALUES('Stone', 'Emma', 'La la Land', 2016);
--------------------CELL_MARKDOWN_15--------------------
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
### <center><span style="color:navy">Step 7: Look at the Data Again</span></center>
OK, so we threw another row in the table.
Let's investigate the table again.
--------------------CELL_CQL_16--------------------
// Query all the rows of the table here:
SELECT * FROM killrvideo.movies_by_actor_name;
--------------------CELL_MARKDOWN_17--------------------
Wait a minute!
What just happened?
Your query only returned one row, but you have inserted two of them.
The first row seems to be gone - what happened?
Now, we are starting to see some of the differences between CQL and SQL.
Remember that our _primary key_ consists of the `last_name` and `first_name` columns.
Think a moment about how Cassandra works.
Remember that Cassandra stores the row based on the primary key.
Cassandra does _not_ do a read before writing - that would be way too slow and would never scale.
So instead, Cassandra just writes the new row to the node _with the same primary key as the first row_, which effectively replaces or updates the row.
This is what Cassandra calls an "upsert".
--------------------CELL_MARKDOWN_18--------------------
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
### <center><span style="color:navy">Step 8: Delete the Table</span></center>
If we want to associate several movies with Emma Stone, we will need to adjust our data model so that each movie entry has it's own primary key.
Once each movie has a unique primary key, we can insert new movie entries for the same actor without stepping on the previous movie entries for that actor.
To make the necessary adjustment to the table, it's probably simplest at this point to delete the table and re-create it with a different primary key.
Execute the following cell with DROP to delete the table:
--------------------CELL_CQL_19--------------------
// Delete the table here:
DROP TABLE killrvideo.movies_by_actor_name;
--------------------CELL_MARKDOWN_20--------------------
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
### <center><span style="color:navy">Step 9: Adjust the Model</span></center>
Now, we can re-create the table with the right kind of primary key.
We want to have one partition per actor, but many movie entries in that partition.
We do this by adjusting the primary key to create _uniqueness_.
Previously, the primary key for the table consisted of a partition key (last_name), and a single clustering column (first_name).
Using this schema, Cassandra stored all actors with the same last name in the same parition, but each unique first name in a separate row.
We want a row for each last_name, first name _and movie title_, so we will add the `movie_title` column as a clustering column within the primary key.
When we query this table, we don't have to supply the movie title as part of the query, it is only there to prevent upserts when we add multiple movies per actor.
Also, since every query will use both the `last_name` and `first_name` columns, we will make both of these columns part of the partition key.
By using both columns as the partition key, we cause Cassandra to store one partition per actor, which means Cassandra will do a single physical access to satisfy our query - making the query as fast as possible.
Here's how we do it:
<center>![create clustering table](https://s3.amazonaws.com/datastaxtraining/developer-day/cassandra-intro/data-modeling/createClusteringTable2.png "create clustering table" )</center>
This `CREATE` statement looks a lot like the first, but notice that the `PRIMARY KEY` section is different.
We have done two things:
* First, we added `movie_title` as a third field in the primary key
* Second, we added parentheses around the `last_name` and `first_name` fields.
These extra parentheses may look extraneous, but they are necessary.
They designate the fields that constitute the partition key portion of the primary key.
Remember, the partition key portion of the primary key is the only part of the primary key that Cassandra hashes to determine the partition token.
Execute this command to create the table, insert both of the movie titles for Emma Stone and finally select all the rows associated with Emma Stone (you can do all this in multiple lines of a single cell by separating each command with a semicolon).
--------------------CELL_CQL_21--------------------
// Create the table here (as shown in the image above):
CREATE TABLE killrvideo.movies_by_actor_name (
last_name text,
first_name text,
movie_title text,
movie_year int,
PRIMARY KEY ((last_name, first_name), movie_title)
);
--------------------CELL_MARKDOWN_22--------------------
Notice the following insert statements are the EXACT SAME insert statements we executed previously.
The only difference now is our primary key, yet the outcome will be slightly different. Why might this be?
--------------------CELL_CQL_23--------------------
// Insert the row for "Battle of the sexes":
INSERT INTO killrvideo.movies_by_actor_name (last_name, first_name, movie_title, movie_year)
VALUES ('Stone', 'Emma', 'Battle of the Sexes', 2017);
// Insert the row for "La la land":
INSERT INTO killrvideo.movies_by_actor_name (last_name, first_name, movie_title, movie_year)
VALUES('Stone', 'Emma', 'La la Land', 2016);
// Select all rows of the table here:
SELECT * FROM killrvideo.movies_by_actor_name;
--------------------CELL_MARKDOWN_24--------------------
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
### <center><span style="color:navy">Step 10: Add More Data</span></center>
Let's add some rows for more actors as shown in the table:
<center>![movie table](https://s3.amazonaws.com/datastaxtraining/solution-days/cassandra-intro/data-modeling/movieTable.png "movie table" )</center>
To make it easy, since you already know how to add rows, we have already prepared the `INSERT` statements.
You just need to execute them in the cell below:
--------------------CELL_CQL_25--------------------
// Insert these rows by executing this cell:
INSERT INTO killrvideo.movies_by_actor_name (last_name, first_name, movie_title, movie_year) VALUES('Depp', 'Johnny', 'Lone Ranger', 2013);
INSERT INTO killrvideo.movies_by_actor_name (last_name, first_name, movie_title, movie_year) VALUES('Depp', 'Johnny', 'Tourist', 2010);
INSERT INTO killrvideo.movies_by_actor_name (last_name, first_name, movie_title, movie_year) VALUES('Jolie', 'Angelina', 'Tourist', 2010);
INSERT INTO killrvideo.movies_by_actor_name (last_name, first_name, movie_title, movie_year) VALUES('Jolie', 'Angelina', 'By the Sea', 2015);
INSERT INTO killrvideo.movies_by_actor_name (last_name, first_name, movie_title, movie_year) VALUES('Pitt', 'Brad', 'Big Short', 2015);
INSERT INTO killrvideo.movies_by_actor_name (last_name, first_name, movie_title, movie_year) VALUES('Pitt', 'Brad', 'Moneyball', 2011);
--------------------CELL_MARKDOWN_26--------------------
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
### <center><span style="color:navy">Step 11: Make Sure the Data is There</span></center>
Do a simple select of all the data in the table to verify the rows look like you think they should:
--------------------CELL_CQL_27--------------------
// Query the table here:
SELECT * FROM killrvideo.movies_by_actor_name;
--------------------CELL_MARKDOWN_28--------------------
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
### <center><span style="color:navy">Step 12: Finding the Data</span></center>
Clearly, we don't want the entire table back from a query - only the rows associated with a specific actor.
(In fact, you should never execute a query that selects the entire table on a production keyspace as this would cause serious performance problems.
We do this in these exercises only for illustrative purposes.)
As you might expect, the CQL query for this looks a lot like an SQL query.
Execute the query in the following cell:
--------------------CELL_CQL_29--------------------
// Execute the query for just Emma Stone here:
SELECT * FROM killrvideo.movies_by_actor_name
WHERE last_name = 'Stone'
AND first_name = 'Emma';
--------------------CELL_MARKDOWN_30--------------------
That query probably worked pretty much like you expected.
You see the two rows for the movies of Emma Stone.
Note that these two rows were in the same parition.
This means that Cassandra only had to do one physical access to service your query - which means it was fast!
--------------------CELL_MARKDOWN_31--------------------
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
### <center><span style="color:navy">Step 13: Problems with Finding the Data</span></center>
It may have occurred to you that since we only have one actor with the last name Stone, we could query the table only specifying the last name.
Let's give it a try.
Execute the command in the following cell:
--------------------CELL_CQL_32--------------------
// Try the query for the actor with just the last_name = Stone:
SELECT * FROM killrvideo.movies_by_actor_name where last_name = 'Stone';
--------------------CELL_MARKDOWN_33--------------------
You see that Cassandra doesn't like this command! Why not?
Remember that the _partition key_ consists of both the `last_name` and `first_name` fields.
Cassandra hashes both of these values together to determine the partition number.
So, Cassandra cannot calculate the partition number by only one of these values.
You might be thinking, "But my relational database can do that...".
Well, that's true, but in today's world, your relational database also is not capable of scaling to the level you need to.
This apparent limitation of Cassandra is not a problem when you make the switch in thinking: remember, start with the query!
If you really want to query merely by last name, you would specify `last_name` as the partition key and use `first_name` as a clustering column.
What would using only the last name as the partition key cause Cassandra to do?
Cassandra would store all of the actors with the same last names, and their associated movies within the same partition.
If the `first_name` field were a clustering column, you would still be able to query using the first name in concert with the last name.
But remember, back in the beginning of this notebook, we used `last_name` as the partition key column and `first_name` as a clustering column.
This primary key configuration caused upserts when we tried to add multiple movies for a single actor.
To avoid the upserts, we could add `movie_title` as a second clustering column.
We would not search on the `movie_title` column, but would use this column to make the primary key unique for the row - thus avoiding the upsert.
There is a bit of a trade-off here to consider when data modeling -
You can make your queries more general by combining more rows into the same partition, but if your partitions get very large, you will suffer a performance penalty.
Remember, a partition is Cassandra's single unit of physical access.
So the least amount of data Cassandra can pull from the disk is a partition.
If your data model forces Cassandra to haul a huge partition off of the disk, only to grab one small row from the partition, your queries will not scale well.
On the other hand, if you can build your tables so that each query retrieves and uses all the data in a single partition, your queries will be fast.
Bottom line: generally try to shoot for a single partition per query.
--------------------CELL_MARKDOWN_34--------------------
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
### <center><span style="color:navy">Step 14: Denormalization</span></center>
If you were born and raised in the relational world, you were taught from an early age that normalization
(meaning, only storing a data item one time) is the _only_ way to represent data.
The reasons for normalization are twofold: first, if you need to update the data, and you have it stored in multiple places,
you will need to know where all the copies of the data are.
Second, if you have multiple copies of the data, there is a possibility that the copies will be inconsistent.
While these arguments are true, normalization comes at a huge cost.
Normalized databases just can't keep up with the demands of today's users' requirements.
So while normalization might be nice, we need to get beyond the normalized mindset if we are going to satisfy our users.
For example, imagine we want to query our killrvideo database using a query based on `movie_year`.
Since `movie_year` is not the partition key (in fact, it's not even part of the primary key),
we can't perform that query on our existing table.
What's a data modeler to do?
Create another table!
Remember, we build tables to satisfy queries.
So, if we have another query, we probably need another table.
Think about this for a minute - what should this table look like?
We want to query based on the movie year, so the movie year needs to be the partition key.
But we also want uniqueness for each movie so we don't get upserts.
The uniqueness requirement means we will need a clustering column that will be unique for each movie - movie title will work for that.
For this query, we really don't care about the actors, so we'll drop the actor related fields.
See if you can fill out the primary key portion of the `CREATE TABLE` statement below:
--------------------CELL_CQL_35--------------------
// The following line will make sure the table does not exist (in case you created it with a previous exercise)
DROP TABLE killrvideo.movies_by_year;
// Complete this statement by filling in the primary key fields
// Remember, you want the movie year as the partition key and the movie title as the clustering column
CREATE TABLE killrvideo.movies_by_year (
movie_year int,
movie_title text,
PRIMARY KEY ((), )
);
--------------------CELL_MARKDOWN_36--------------------
<details>
<summary><i><span style="color:blue">Click here for the solution.</span></i></summary>
```
CREATE TABLE killrvideo.movies_by_year (
movie_year int,
movie_title text,
PRIMARY KEY ((movie_year), movie_title)
);
```
</details>
--------------------CELL_CQL_37--------------------
// We'll also throw some rows in the table - you don't need to change any of the lines below this comment
INSERT INTO killrvideo.movies_by_year (movie_year, movie_title) VALUES(2013, 'Lone Ranger');
INSERT INTO killrvideo.movies_by_year (movie_year, movie_title) VALUES(2010, 'Tourist');
INSERT INTO killrvideo.movies_by_year (movie_year, movie_title) VALUES(2010, 'Inception');
INSERT INTO killrvideo.movies_by_year (movie_year, movie_title) VALUES(2015, 'By the Sea');
INSERT INTO killrvideo.movies_by_year (movie_year, movie_title) VALUES(2015, 'Big Short');
INSERT INTO killrvideo.movies_by_year (movie_year, movie_title) VALUES(2011, 'Moneyball');
INSERT INTO killrvideo.movies_by_year (movie_year, movie_title) VALUES(2011, 'Hugo');
INSERT INTO killrvideo.movies_by_year (movie_year, movie_title) VALUES(2012, 'Skyfall');
INSERT INTO killrvideo.movies_by_year (movie_year, movie_title) VALUES(2011, 'Interstellar');
--------------------CELL_MARKDOWN_38--------------------
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
### <center><span style="color:navy">Step 15: Check Out the New Table</span></center>
Let's run a query on this table.
See if you can figure out how to get all the movie released in 2011:
--------------------CELL_CQL_39--------------------
// Write your query here (if you are having trouble, follow the general form of the query from step 13):
--------------------CELL_MARKDOWN_40--------------------
<details>
<summary><i><span style="color:blue">Click here for the solution.</span></i></summary>
```
SELECT * FROM killrvideo.movies_by_year where movie_year = 2011;
```
</details>
--------------------CELL_MARKDOWN_41--------------------
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
### <center><span style="color:navy">Step 16: Ordering Results</span></center>
OK, one last thing we may want to discuss -
Suppose you want to order the results of your query.
If you look at the results returned by your last query, you will notice the results are in alphabetical order of the movie title.
This is not by chance.
Cassandra orders query results based on the clustering column.
However, you may reverse the order by using the `ORDER BY` clause.
The default order is ascending (`ASC`), but you can change it to descending (`DESC`).
Run the following query and inspect the results:
--------------------CELL_CQL_42--------------------
SELECT * FROM killrvideo.movies_by_year
WHERE movie_year = 2011
ORDER BY movie_title DESC;
--------------------CELL_MARKDOWN_43--------------------
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
# <center><span style="color:navy">Congratulations!!!!!</span></center>
#### You have completed the _Intro to Data Modeling_ notebook!
Now you know the main concepts you need to be an Apache Cassandra™ data modeler.
Clearly, there are few more details we were not able to cover in this short notebook, but you have the main basics down.
Let's review what you have learned:
* How to create a keyspace using CQL
* How to create a basic table using CQL
* How to insert data using CQL
* How to drop a table using CQL
* How to form a primary key
* What upserts are
* How to use clustering columns for uniqueness
* How to denormalize your data to create blazing fast queries
* How to use clustering columns to order results
WOW! That's quite a lot!
At this point, it should be clear that working with Apache Cassandra™ isn't scary, and
you should feel confident and satisifed that you understand fundamental Cassandra data modeling!
![line](https://datastaxtraining.s3.amazonaws.com/developer-day/lineblue.png "line" )
# <center><span style="color:navy">Want to Know More?</span></center>
#### Check out these great <span style="color:green">FREE</span> classes on DataStax Academy:
* [DS201: DataStax Enterprise Foundations of Apache Cassandra™](http://academy.datastax.com/resources/ds201-foundations-apache-cassandra "DS201: DataStax Enterprise Foundations of Apache Cassandra™")
* [DS210: DataStax Enterprise Operations with Apache Cassandra™](https://academy.datastax.com/resources/ds210-datastax-enterprise-operations-apache-cassandra "DS210: DataStax Enterprise Operations with Apache Cassandra™")
* [DS220: DataStax Enterprise 6 Practical Application Data Modeling with Apache Cassandra™](https://academy.datastax.com/resources/ds220 "DS220: DataStax Enterprise 6 Practical Application Data Modeling with Apache Cassandra™")