An Introduction to Beeswarm Charts with D3 and Svelte

Beeswarm plots offer an elegant and interactive way to visualize distributions, outliers and patterns within datasets. In this lesson, we will cover the fundamentals of building a beeswarm plot, beginning with an in-depth explanation of force simulations in D3 data visualizations. Then, we will learn how to prepare data for a force layout via D3’s data manipulation methods (e.g., rollup), how to create a force layout with the d3-force module (to automatically position nodes, represented as circles, in clusters and avoid node collisions/overlap) and how to render nodes of the force layout in a responsive SVG canvas with Svelte.

Lesson Transcript

[00:00 - 00:06] In this lesson, we're going to be building a B-swarm chart. So this is what it looks like. These circles are arranged according to physics.
[00:07 - 00:22] On top of that, there's a bit of additional interaction this chart offers, where clicking sorts these countries by continent. As you can see per the title, this B-swarm plot is essentially visualizing every country in the world according to its happiness score.
[00:23 - 00:32] You can read more about the data. It's from the 2022 World Happiness Report. If you're interested in that, but what's important here is basically three things that are unique about this B-swarm chart.
[00:33 - 00:50] The first is that these circles are arranged by physics. If I were to refresh this page, these circles would not be present in the exact same positions that they're present right now. You would actually see a bit of randomness applied every time that this layout is generated. That's because we're using D3 force behind the scenes.
[00:51 - 01:07] Second is this click interaction. It's only powered yet again by this physics- based D3 force module. Third, these tooltips you'll notice are dynamic. This might be more CSS than it felt, but I really like to bring some like whimsy and design to visualization.
[01:08 - 01:13] We're going to be building this from scratch. What we're going to do is scaffold our project template once again.
[01:14 - 01:25] You'll remember that the way that you want to do this is by opening your terminal application. Here I have my terminal up. I'm going to run this command npx-degit and then the long name.
[01:26 - 01:37] What we'll see here is that if it did in fact succeed, it will say cloned my template to this new directory called B-swarm. We can confirm that exists by typing ls.
[01:38 - 02:02] Let's go ahead and cd into B-swarm. If you're on Mac, I would advise go ahead and open that project using code period. If you're on Windows, you might have a different way of opening it. You might find that folder, right click open, or you might open VS code first and then find the project. But here I have the B-swarm project now open. I can go ahead and run an NPM install to basically install the necessary dependencies.
[02:03 - 02:15] And now if I run NPM run dev, we see this project is in fact up and running. So back to our code, here's an example of how we would write a force diagram from scratch.
[02:16 - 02:28] The first line of code imports multiple methods from the D3 force library, including force simulation, force y, force x, and force collide. Now the first thing that we see is force simulation.
[02:29 - 02:35] This is how we instantiate a new simulation. Then the question is, what is within that simulation?
[02:36 - 02:46] And the answer is two different forces. Now forces can be thought of as what charges the physics? What forces are at play that are moving things around?
[02:47 - 03:21] So here we're basically telling our force simulation, there's an x-force, which is going to move things on the horizontal axis and a y-force which move things on the vertical axis. So here we're saying the x-force, the thing that is moving our elements towards the horizontal axis is of the following property. It is a force x with an argument of x that takes an an accessor that calls itself x and has a strength of 0.002. Force y does the exact same thing, but on the y-axis.
[03:22 - 03:31] So I know this seems complicated, so let's talk a little bit more about what's happening here. Nodes is an array of objects. Each of these containing an x and a y property, right?
[03:32 - 03:43] That's what we're accessing here and here, is the x and y. So think of this as an array. Each of those array items has at least an x and a y property.
[03:44 - 05:18] Then we're using those properties, the x and the y, to apply forces, which as I write here, represents a new interaction with the overall simulation. And then we could add others as well, like force collide is going to bring things away from one another so they don't overlap, and force center will bring things towards the center. So as an example of what this code actually does, right? Here's like a minimal code sandbox that I have that's meant to illustrate this. And I'll go ahead and open app dots felt. And what you can see here is that simulation is a force simulation and the forces that are being applied include force collide, force x and force y. So force collide is preventing overlap force x is adjusting where they live on the horizontal axis, force y is doing so on the y-axis. And what we can see here is that then all these circles create this B-swarm shape. And as an example, if I were to delete force collide, which remember, as I mentioned, is meant to prevent collision, if I were to remove that, all of these circles would then be overlapping with one another. So you can see how each of these forces can be really important to creating this physics-based visual. So in our chart, we're going to want three forces, force x, force y, and force collide. For the reasons I just described, right? We want to organize circles on the x-axis according to their happiness score. We want to pay attention to their y position, in particular, because we want it at the center of our chart. So just 50% down, our force y should basically just be static.
[05:19 - 08:37] And then force collide because we don't want our circles to overlap like this, we actually want them to pay attention to each other's shape. And so what we're going to need to do in order to bring these forces in is import some data that allows us to organize countries with these forces. So data.js is available on this link. You can go ahead and find the data for this lesson. And what I'm going to want to do from here is bring it into my codebase. So again, this is app.spelt, but what we want to do is add this to our data folder. Instead of this example data, we want to replace it with the array that I just identified. So what I'd actually advise doing is opening this and just command a to select everything, command c to copy it, and then literally just paste it into your file. Okay. So we'll click save, I'll go back to this other tab. And now we notice three properties within each item in this array, country, happiness, and continent. Now continent is going to be how we fill and eventually arrange these circles on the y axis. And happiness is how we organize things on the x axis country, obviously, we need an identifier for each item in the array. And so in app.spelt, now that we have this array of items in data.js, we can go ahead and import it just like we import data in our last example. As it says here, we'll go ahead and write import data from, and then we'll find the data file using this alias data.js. Now to verify that this is in fact working, we'll go ahead and console dot log data. Now I'm going to reopen my Firefox, open the console and refresh. And as you can see here, data does in fact exist. As you can see within the actual log, these numbers are being read as numbers, which means our data is imported, and it's transformed in the proper layout. Happiness is how we're going to organize circles on the x axis. continent is how we're going to fill the color of our circles. And country is the unique identifier for each circle in our data. So let's go ahead and begin by using that D three snippet that we had identified earlier to import the necessary modules that we need to arrange things with physics. So what I'm going to want to import is force simulation, which as you'll remember is how we first instantiate the entire physics based simulation. Then I want to import force x, because I want things to be arranged on the x axis, force y for the same reason , and force collide to prevent overlap, just as we had identified last time, we've imported these, but the question is from where? And the answer is from D three force. So we have all these imported , of course, they're not being used yet, as it says here. So we need to actually use those. Let's go ahead and create a new simulation. You could call this anything. I'm going to call it simulation.
[08:38 - 10:11] And it's going to be equal to the force simulation of data, instantiate a new force simulation with the only argument being the data array we want to pass in. Then we have a series of arguments that follow that basically declare each relevant force. What does that mean? I could do dot force, and then the name of the force. So I'm going to call it my x force. And the argument within that is I'm kind of like what method is being applied on the x axis. So it's going to be a force x. And this is going to look complicated. I would really recommend reading through a little bit of the documentation behind forces. But I'll do my best to explain now. So we're using a force x. And force x takes its own property called x, which then takes in a relevant data accessor. So recall that within our data array, we have three variables, continent, country, and happiness. You'll remember, as I said, that happiness is the variable that we want to use on the x axis. So here, we'll say within each D, or you know, any access here, we can call this J, we can call this whatever we want, within each D, find D dot happiness, because that is the relevant variable we're going to use on our x axis. Then you could optionally add a final parameter here called strength, which is basically how strong is the physics at hand here? That's organizing things on our x x axis. For us, we'll do 0.8.
[10:12 - 10:36] Now this will finish our x force, so we can close this entire argument moving to our next force, which is the y force. So this is going to take a pretty similar structure, right? The name of the force is y. And then within the y force, we want to use this method that we imported. And rather than taking in an x property, obviously, it's going to take in a y property.
[10:37 - 12:02] Okay, and then we're going to do the same accessor, but the question is, how are things organized on the y axis? And let's go ahead and say it's according to their continent, to see what this ends up producing. If we wanted things to be all centered at the half of the, like the midpoint of the screen, for example, we could find the height of the overall chart and do half of it, for example. But for now, we're going to organize things by their continent. So D dot continent, and then yet again, we have the strength. And for this, because I've already done this beforehand, I know the magic numbers, you would have to test this, I'm going to say a strength of 0.2. Then I'm going to save the final force that we want is our collision force. So here, we're going to name something like name the force collide. And then within the force itself, we're going to pass the following method, we'll do force collide, open close. And the only argument that this takes is a radius. So I'm just going to give a radius of five. And to be safe, you know, rather than giving it a hard coded number here, I'm going to reference this elsewhere. I'm going to make it its own variable. So above simulation, I'm going to const radius, I'm going to capitalize it as kind of a note to myself, this will never change, right? This is an immutable variable. And I'm going to call it five, then I'm going to go and replace this five with radius.
[12:03 - 12:17] So if I update this elsewhere, it will cascade downward, anywhere radius is in fact referenced. Okay. So we have a simulation, right now it's being unused, which you'll notice according to your code formatter.
[12:18 - 12:29] So let's go ahead and see what it is with the simplest declaration, just console.logging it. And if we console.log simulation, and then go back into our console, we'll see that it's an object.
[12:30 - 12:58] And that object includes a lot of different properties, including alpha, alpha decay, alpha min, alpha target, find force nodes on, etc. Now, what's of interest to us is going to be this property called nodes. Now nodes is a function, right? Now you can't really tell what's going on. So let's actually go ahead and console.log simulation.nodes to see what it looks like inside.
[12:59 - 13:11] Again, it's a function. So you need to access the function using this open close parentheses. Now if I refresh, you'll see that we actually see 146 items, just like we did within our data.
[13:12 - 13:26] But the difference is, these also include an index of V X, a V Y, an X and a Y. So this is really interesting because we basically have our exact same data array with some new properties applied.
[13:27 - 14:32] And we did not generate these index V X, V Y, X Y. This was the code itself, which means that we're probably doing something right and creating these new nodes. And so the key thing to notice here is that simulation is this object with a lot going on. And you can change those properties that we had looked at earlier, like alpha, alpha decay programmatically. But if you want to access the output of the simulation, you're going to want to use this dot nodes function. Because as you can see here, this is where the actual numbers are appearing that we're generating dynamically. Okay. So we noticed two problems. First, these X values are unscaled. 2.404 for Afghanistan is identical to its actual value 2.404. We don't want to put things at their raw values, because then every circle would be contained between zero and 10 on your screen , which is an incredibly small canvas. So what we're going to do is scale our data using D three scales.
[14:33 - 14:51] The second issue is that our Y values, V Y and Y are non-existent. They're not a number. And what that means is that we did something wrong. We're basically not positioning these correctly. And the reason is D dot continent is not numeric, right? It is a textual variable.
[14:52 - 16:29] So what force Y is expecting is some number to arrange by, and we're giving it a string. So we need to then take our strings and arrange things according to those strings on the Y axis, which is going to also require a new scale, um, scale band to basically organize things categorically, according to textual variables. But before we get into scaling, both X scale and Y scale, we should go ahead and set up the infrastructure around a basic chart. And once we have this chart, we'll then have, you know, a width and a height that we can then reference in the D three scales that we're going to need to create. So let's go ahead and create that chart step by step, using the same type of pattern that we used in the simple scatter plot last module. First, we'll instantiate a width variable and a height variable. I'll make both of them 400 for now. Then we want to instantiate a new margin variable, where margin is meant to represent the padding around a chart. The padding outside the chart is considered the margin. And what's inside is the inner chart. For our purposes, we're going to give it a top of zero, a right of zero, a bottom of 20, and a left of zero. Then we're going to create new variables for our inner width and our inner height. These are basically the overall width and height, and then subtracting the horizontal and vertical margin respectively. So here it'll be width minus the left and right margin. And for inner height, it'll be height minus the vertical margins on the top and bottom.
[16:30 - 18:19] We are prefixing inner width with a dollar label. And actually, we don't need to do this for inner height. But the reason that we are instantiated in her with with a dollar label is because we want it to update if and when width updates. Now you might be asking, okay, when does width update right now? Not at all. You're right. But in an ideal world, in our responsive visualizations, width will be dynamically updating according to the window width itself. And we can go ahead and get started with this little chunk of code and create a new div and call it chart container, and then give it a dimension binding. Remember that that syntax is bind client width to the name of the variable that we want to target. So here we want to update width to be equivalent to the chart containers total width if and when that updates. Let's go ahead and save that. And if we wanted to test to see if this works, we could go ahead and console dot log inner width, save, and then resize our window. And what you'll notice is that inner width does in fact respond, it does in fact update, which suggests that we do already have this width binding properly updating and inner width also updating after the fact. Now the final thing that we want to do is create a new SVG inside using the width and height that we've already defined. So let's say width equals width and height equals height . Or you may remember the shorthand because these names are the same as the the variables that they are accepting, we can just put the variable itself in a curly bracket. So we can open and close this SVG and then close the outer div. And now we have basically a chart container that for now is 400 pixels high and however many pixels wide the screen is wide. Now we can go ahead and create our scales.
[18:20 - 18:38] Scales are a way to map raw values to physical points on a canvas, right? Instead of zero to 10, which is the range of potential happiness scores, we want to go from zero to width, where width is 514 or for your screen is going to be a different size.
[18:39 - 19:15] We want to take advantage of that entire potential space. And what we're going to use to do that are scale linear and scale band. And scale linear is going to be for quantitative data. And scale band is going to be for categorical data. Now what does that mean? Well quantitative or numeric you can think of as any number input, right? That's simple enough. Scale band takes in categorical or string variables in or inputs instead of numeric inputs. And that's the key difference between these two. Let's go ahead and start with the easy one, which is going to be our X scale.
[19:16 - 20:25] We're going to create a new scale, call it X scale. And we're prefixing it with this dollar label. Because as you might remember, inner width is also instantiated with a dollar label. And as you'll see soon, this is the variable that we're going to reference within the scale. So this is going to be a scale linear with a domain, which you may remember is the input and a range, which you might remember is the output. Now the output is easy. We're going to go from zero to inner width, because as I said, we want all of our points to occupy from zero to the end of our screen, minus the margins. But the domain is a little bit harder. The question is what do we want the domain to be ? You could dynamically generate the domain for your X scale by basically finding the minimum and maximum value of happiness within our data set. Or you could just work smarter, not harder, and know that it's roughly between zero and 10. And in my findings, I found that there's no countries under one or over nine. So I'll just make it one and nine. Okay. But if you're here, because you want to know technically what's proper, you would probably import extent from D three array.
[20:26 - 21:18] And then what you would do is pass extent into the domain. And you would look within the data array. Remember that data is this guy right here with all of our countries within. And you would pass an accessor, which would be D dot happiness. So basically, I'm looking for the minimum and maximum of happiness within the data array. You could pass this extent in, or you could just hard code it like I want to. So we now have an X scale. And what we can do if we want is pass this in to the above simulation. But I'm going to wait until we do the y scale, just so we can do it all at once. Let's create a new y scale, which is not a scale linear function, but instead a scale band. And the reason for this is because we want to map these categorical variables known as continents.
[21:19 - 22:58] And we want to map those according to positions on the canvas that are basically equally spaced from one another. But basically, we want this to be a scale band function, which again takes in the exact same parameters, a domain, and a range, where domain is input and range is output. But here, the range is quite simple. Just like last time, it's going to be inner height to zero. And you might remember this is because the coordinate system is flipped in SVG. But then the domain is harder, right? So what do we want? We effectively want to get a list of each of the continents present within our data array. So Asia, Africa, all the way down finding each continent. Now you couldn't manually go through and find these and then write them one by one in an array. It's probably not the best way to do it. The safer way to do it is map through our data array , where map basically retrieves one element per item in the overall array and return the continent variable. Now this will basically return a list of continents that can then be used as the domain. And based on their positions in this domain, passing that value into the y scale will return its numerically equivalent position in the range. Sounds pretty confusing. Let's go ahead and console log y scale of Asia and see what happens. If I refresh this, you'll see 316.6. What would happen if I passed Africa, you would see 253.3. And if I passed North America, you're going to see 63.333.
[22:59 - 24:25] You'll notice that these numbers are pretty similar in that they're actually multiples of one another, because like I said, they're equivalently positioned within the same range. Okay, so we know that our y scale is in fact working. So finally, we have our scales. That was quite fun. We're actually going to change this code a bit in a few seconds. But for now, let's go ahead and sub in our data to see how close we are to where we want to be. So up in our simulation, we'll remember we're passing these x and y values. The issue was that these were unscaled. So just to review, before we make this change, within the simulation array, we had these raw numbers for x, like 2.4, which is way too small. And we had not a number for y, which meant that this just wasn't working at all. So now we're using x scale and y scale to fix that issue. I'll pass d dot happiness into x scale, and d dot continent into y scale. And then I'll hit save. Now immediately, you're going to see a bug. And the bug is that x scale is not a function. Now you might be saying, yes, it is, I literally instantiated it right here. What do you mean? It's definitely a function. Notice how x scale is instantiated with the dollar label. This is going to be an important pattern to pay attention to is how we declare variables in our spell applications. Here, x scale is reactive.
[24:26 - 25:27] It is instantiated with a dollar label. So whenever simulation is created at basically application runtime, when the page first loads, x scale doesn't actually exist yet. And this is just the internals of how it works, how it declares reactive variables. So if we want simulation to actually reflect x scale properly, it needs to be created with the same method, in this case, with a dollar label. Now if we go ahead and hit save now, we'll see that that issue might get solved. But there's another one, which is that simulation is undefined, which is because right on 28, we are console dot logging simulation. But for the same issue as prior, that doesn't exist yet. Let's prefix this with a dollar label as well, and hit save. Our simulation is now working. If you look at the console log, we can verify this by opening any of the objects, and looking at the vact, vy, x and y variables, recall that previously, they were super small numbers and nn, which stands for not a number.
[25:28 - 26:17] Now, they are properly scaled numbers, 295, and actually a number 61.9. Therefore, our scaling is in fact working. We've created scales that do in fact, transform our raw data, both textual string based data and numeric data into positions on a canvas. So for our final step of this lesson, let's go ahead and render these circles using their newly scaled values. And we 're going to create a g element that accepts the inner chart. And let's call it inner charts that we can remember this. And within this chart, we want to transform its position to the right, the equivalent of margin dot left, and down the equivalent of margin dot top. We're going to open and close this g element.
[26:18 - 26:38] And the question is, what do we want within the each block? What do we want within the inner chart? And the answer is all 100. And however many circles it is 146 circles rendered according to their x and y position. So we can go ahead and write each simulation dot nodes as node.
[26:39 - 30:09] Or for the sake of simplicity, we can go ahead and replace this with a variable. We could say nodes is equal to simulation dot nodes. Then in our each block, what do we want? We want one circle per node with the following properties. We wanted to have a Cx of node dot x, a c y of node dot y, in R of radius, and a fill of steel blue, because that's a much prettier color than basic red. Let's go ahead and save this and then end our each block. Now, if we save and refresh, you'll notice that the circles appear in the top left corner. And then on resize, they in fact appear in the right place. For now, this is sufficient. What we've done is we basically positioned each of our circles according to their x and y position. Now, at runtime, these nodes are in fact undefined, which is why it's creating this weird positioning in the top left corner. But we're going to fix that in the next lesson. For now, what we've essentially done is we've created a properly scaled chart that uses physics to position these B swarm elements. But there's one other thing we want to fix here. And that is the order of these continents. So right now, the order is just passed in randomly according to the order of data in the data array that we passed. Now, what does that mean? Effectively, what the continents array looks like is a list of continents as they appear in the data. So right now, it's something like Asia, Africa, and whatever the next continent to appear would be third and list, etc, etc, etc. We probably don't want to do that. We probably want to order these continents according to some meaningful variable. A good example would be like the average happiness score of each continent. And that way they'll appear within our axis more properly, like in order to look visually, as you would expect as you look at a chart to see these in order. And the way that we're going to do this is a very complex function that I do not expect you to understand. That looks like this. And you notice that I've commented in a few different places, so it 's more clear what we're doing. But we're basically generating the average for each continent so that we can sort according to that. We're using D three arrays functions, roll ups, and mean to basically group the data by continent. And within that grouping, find the mean of happiness scores. Then after we do that, we're sorting so that those with the highest happiness scores are at the top and those with the lowest happiness scores are at the bottom. And then we are mapping and retrieving the actual continent name as a value. So if I copy this continents array that we just created and paste it here, you'll notice that the data updates slightly. And now these are actually, you know, sorted according to average happiness value. The last thing that I want to do is add some padding on the top and bottom of this chart. And so I'm going to use an additional function or method within scale band, which is called padding outer. It's going to have a padding outer of 0.5. You can change this as you see fit, but you'll notice that it now adds some space between the top of the chart and the circles. So there's a bit more breathing room.
[30:10 - 30:24] There's also padding inner, which you could play around with, but padding outer meets the objective of this basic chart. So we have a chart, but definitely not perfect. It starts off in this top left corner. It fixes on resize, but the resize looks kind of choppy.
[30:25 - 30:32] It looks kind of ugly. So we're close. And the purpose of this lesson was just to get set up drawing circles with physics.
[30:33 - 30:43] In the next lesson, we're going to make these circles reactive. So they have a bit more momentum. The physics based layout is more intuitive. And this initial rendering issue is not present.
[30:44 - 30:48] (upbeat music) (upbeat music)

[00:00 - 00:06] In this lesson, we're going to be building a B-swarm chart. So this is what it looks like. These circles are arranged according to physics.

[00:07 - 00:22] On top of that, there's a bit of additional interaction this chart offers, where clicking sorts these countries by continent. As you can see per the title, this B-swarm plot is essentially visualizing every country in the world according to its happiness score.

[00:23 - 00:32] You can read more about the data. It's from the 2022 World Happiness Report. If you're interested in that, but what's important here is basically three things that are unique about this B-swarm chart.

[00:33 - 00:50] The first is that these circles are arranged by physics. If I were to refresh this page, these circles would not be present in the exact same positions that they're present right now. You would actually see a bit of randomness applied every time that this layout is generated. That's because we're using D3 force behind the scenes.

[00:51 - 01:07] Second is this click interaction. It's only powered yet again by this physics- based D3 force module. Third, these tooltips you'll notice are dynamic. This might be more CSS than it felt, but I really like to bring some like whimsy and design to visualization.

[01:08 - 01:13] We're going to be building this from scratch. What we're going to do is scaffold our project template once again.

[01:14 - 01:25] You'll remember that the way that you want to do this is by opening your terminal application. Here I have my terminal up. I'm going to run this command npx-degit and then the long name.

[01:26 - 01:37] What we'll see here is that if it did in fact succeed, it will say cloned my template to this new directory called B-swarm. We can confirm that exists by typing ls.

[01:38 - 02:02] Let's go ahead and cd into B-swarm. If you're on Mac, I would advise go ahead and open that project using code period. If you're on Windows, you might have a different way of opening it. You might find that folder, right click open, or you might open VS code first and then find the project. But here I have the B-swarm project now open. I can go ahead and run an NPM install to basically install the necessary dependencies.

[02:03 - 02:15] And now if I run NPM run dev, we see this project is in fact up and running. So back to our code, here's an example of how we would write a force diagram from scratch.

[02:16 - 02:28] The first line of code imports multiple methods from the D3 force library, including force simulation, force y, force x, and force collide. Now the first thing that we see is force simulation.

[02:29 - 02:35] This is how we instantiate a new simulation. Then the question is, what is within that simulation?

[02:36 - 02:46] And the answer is two different forces. Now forces can be thought of as what charges the physics? What forces are at play that are moving things around?

[02:47 - 03:21] So here we're basically telling our force simulation, there's an x-force, which is going to move things on the horizontal axis and a y-force which move things on the vertical axis. So here we're saying the x-force, the thing that is moving our elements towards the horizontal axis is of the following property. It is a force x with an argument of x that takes an an accessor that calls itself x and has a strength of 0.002. Force y does the exact same thing, but on the y-axis.

[03:22 - 03:31] So I know this seems complicated, so let's talk a little bit more about what's happening here. Nodes is an array of objects. Each of these containing an x and a y property, right?

[03:32 - 03:43] That's what we're accessing here and here, is the x and y. So think of this as an array. Each of those array items has at least an x and a y property.

[03:44 - 05:18] Then we're using those properties, the x and the y, to apply forces, which as I write here, represents a new interaction with the overall simulation. And then we could add others as well, like force collide is going to bring things away from one another so they don't overlap, and force center will bring things towards the center. So as an example of what this code actually does, right? Here's like a minimal code sandbox that I have that's meant to illustrate this. And I'll go ahead and open app dots felt. And what you can see here is that simulation is a force simulation and the forces that are being applied include force collide, force x and force y. So force collide is preventing overlap force x is adjusting where they live on the horizontal axis, force y is doing so on the y-axis. And what we can see here is that then all these circles create this B-swarm shape. And as an example, if I were to delete force collide, which remember, as I mentioned, is meant to prevent collision, if I were to remove that, all of these circles would then be overlapping with one another. So you can see how each of these forces can be really important to creating this physics-based visual. So in our chart, we're going to want three forces, force x, force y, and force collide. For the reasons I just described, right? We want to organize circles on the x-axis according to their happiness score. We want to pay attention to their y position, in particular, because we want it at the center of our chart. So just 50% down, our force y should basically just be static.

[05:19 - 08:37] And then force collide because we don't want our circles to overlap like this, we actually want them to pay attention to each other's shape. And so what we're going to need to do in order to bring these forces in is import some data that allows us to organize countries with these forces. So data.js is available on this link. You can go ahead and find the data for this lesson. And what I'm going to want to do from here is bring it into my codebase. So again, this is app.spelt, but what we want to do is add this to our data folder. Instead of this example data, we want to replace it with the array that I just identified. So what I'd actually advise doing is opening this and just command a to select everything, command c to copy it, and then literally just paste it into your file. Okay. So we'll click save, I'll go back to this other tab. And now we notice three properties within each item in this array, country, happiness, and continent. Now continent is going to be how we fill and eventually arrange these circles on the y axis. And happiness is how we organize things on the x axis country, obviously, we need an identifier for each item in the array. And so in app.spelt, now that we have this array of items in data.js, we can go ahead and import it just like we import data in our last example. As it says here, we'll go ahead and write import data from, and then we'll find the data file using this alias data.js. Now to verify that this is in fact working, we'll go ahead and console dot log data. Now I'm going to reopen my Firefox, open the console and refresh. And as you can see here, data does in fact exist. As you can see within the actual log, these numbers are being read as numbers, which means our data is imported, and it's transformed in the proper layout. Happiness is how we're going to organize circles on the x axis. continent is how we're going to fill the color of our circles. And country is the unique identifier for each circle in our data. So let's go ahead and begin by using that D three snippet that we had identified earlier to import the necessary modules that we need to arrange things with physics. So what I'm going to want to import is force simulation, which as you'll remember is how we first instantiate the entire physics based simulation. Then I want to import force x, because I want things to be arranged on the x axis, force y for the same reason , and force collide to prevent overlap, just as we had identified last time, we've imported these, but the question is from where? And the answer is from D three force. So we have all these imported , of course, they're not being used yet, as it says here. So we need to actually use those. Let's go ahead and create a new simulation. You could call this anything. I'm going to call it simulation.

[08:38 - 10:11] And it's going to be equal to the force simulation of data, instantiate a new force simulation with the only argument being the data array we want to pass in. Then we have a series of arguments that follow that basically declare each relevant force. What does that mean? I could do dot force, and then the name of the force. So I'm going to call it my x force. And the argument within that is I'm kind of like what method is being applied on the x axis. So it's going to be a force x. And this is going to look complicated. I would really recommend reading through a little bit of the documentation behind forces. But I'll do my best to explain now. So we're using a force x. And force x takes its own property called x, which then takes in a relevant data accessor. So recall that within our data array, we have three variables, continent, country, and happiness. You'll remember, as I said, that happiness is the variable that we want to use on the x axis. So here, we'll say within each D, or you know, any access here, we can call this J, we can call this whatever we want, within each D, find D dot happiness, because that is the relevant variable we're going to use on our x axis. Then you could optionally add a final parameter here called strength, which is basically how strong is the physics at hand here? That's organizing things on our x x axis. For us, we'll do 0.8.

[10:12 - 10:36] Now this will finish our x force, so we can close this entire argument moving to our next force, which is the y force. So this is going to take a pretty similar structure, right? The name of the force is y. And then within the y force, we want to use this method that we imported. And rather than taking in an x property, obviously, it's going to take in a y property.

[10:37 - 12:02] Okay, and then we're going to do the same accessor, but the question is, how are things organized on the y axis? And let's go ahead and say it's according to their continent, to see what this ends up producing. If we wanted things to be all centered at the half of the, like the midpoint of the screen, for example, we could find the height of the overall chart and do half of it, for example. But for now, we're going to organize things by their continent. So D dot continent, and then yet again, we have the strength. And for this, because I've already done this beforehand, I know the magic numbers, you would have to test this, I'm going to say a strength of 0.2. Then I'm going to save the final force that we want is our collision force. So here, we're going to name something like name the force collide. And then within the force itself, we're going to pass the following method, we'll do force collide, open close. And the only argument that this takes is a radius. So I'm just going to give a radius of five. And to be safe, you know, rather than giving it a hard coded number here, I'm going to reference this elsewhere. I'm going to make it its own variable. So above simulation, I'm going to const radius, I'm going to capitalize it as kind of a note to myself, this will never change, right? This is an immutable variable. And I'm going to call it five, then I'm going to go and replace this five with radius.

[12:03 - 12:17] So if I update this elsewhere, it will cascade downward, anywhere radius is in fact referenced. Okay. So we have a simulation, right now it's being unused, which you'll notice according to your code formatter.

[12:18 - 12:29] So let's go ahead and see what it is with the simplest declaration, just console.logging it. And if we console.log simulation, and then go back into our console, we'll see that it's an object.

[12:30 - 12:58] And that object includes a lot of different properties, including alpha, alpha decay, alpha min, alpha target, find force nodes on, etc. Now, what's of interest to us is going to be this property called nodes. Now nodes is a function, right? Now you can't really tell what's going on. So let's actually go ahead and console.log simulation.nodes to see what it looks like inside.

[12:59 - 13:11] Again, it's a function. So you need to access the function using this open close parentheses. Now if I refresh, you'll see that we actually see 146 items, just like we did within our data.

[13:12 - 13:26] But the difference is, these also include an index of V X, a V Y, an X and a Y. So this is really interesting because we basically have our exact same data array with some new properties applied.

[13:27 - 14:32] And we did not generate these index V X, V Y, X Y. This was the code itself, which means that we're probably doing something right and creating these new nodes. And so the key thing to notice here is that simulation is this object with a lot going on. And you can change those properties that we had looked at earlier, like alpha, alpha decay programmatically. But if you want to access the output of the simulation, you're going to want to use this dot nodes function. Because as you can see here, this is where the actual numbers are appearing that we're generating dynamically. Okay. So we noticed two problems. First, these X values are unscaled. 2.404 for Afghanistan is identical to its actual value 2.404. We don't want to put things at their raw values, because then every circle would be contained between zero and 10 on your screen , which is an incredibly small canvas. So what we're going to do is scale our data using D three scales.

[14:33 - 14:51] The second issue is that our Y values, V Y and Y are non-existent. They're not a number. And what that means is that we did something wrong. We're basically not positioning these correctly. And the reason is D dot continent is not numeric, right? It is a textual variable.

[14:52 - 16:29] So what force Y is expecting is some number to arrange by, and we're giving it a string. So we need to then take our strings and arrange things according to those strings on the Y axis, which is going to also require a new scale, um, scale band to basically organize things categorically, according to textual variables. But before we get into scaling, both X scale and Y scale, we should go ahead and set up the infrastructure around a basic chart. And once we have this chart, we'll then have, you know, a width and a height that we can then reference in the D three scales that we're going to need to create. So let's go ahead and create that chart step by step, using the same type of pattern that we used in the simple scatter plot last module. First, we'll instantiate a width variable and a height variable. I'll make both of them 400 for now. Then we want to instantiate a new margin variable, where margin is meant to represent the padding around a chart. The padding outside the chart is considered the margin. And what's inside is the inner chart. For our purposes, we're going to give it a top of zero, a right of zero, a bottom of 20, and a left of zero. Then we're going to create new variables for our inner width and our inner height. These are basically the overall width and height, and then subtracting the horizontal and vertical margin respectively. So here it'll be width minus the left and right margin. And for inner height, it'll be height minus the vertical margins on the top and bottom.

[16:30 - 18:19] We are prefixing inner width with a dollar label. And actually, we don't need to do this for inner height. But the reason that we are instantiated in her with with a dollar label is because we want it to update if and when width updates. Now you might be asking, okay, when does width update right now? Not at all. You're right. But in an ideal world, in our responsive visualizations, width will be dynamically updating according to the window width itself. And we can go ahead and get started with this little chunk of code and create a new div and call it chart container, and then give it a dimension binding. Remember that that syntax is bind client width to the name of the variable that we want to target. So here we want to update width to be equivalent to the chart containers total width if and when that updates. Let's go ahead and save that. And if we wanted to test to see if this works, we could go ahead and console dot log inner width, save, and then resize our window. And what you'll notice is that inner width does in fact respond, it does in fact update, which suggests that we do already have this width binding properly updating and inner width also updating after the fact. Now the final thing that we want to do is create a new SVG inside using the width and height that we've already defined. So let's say width equals width and height equals height . Or you may remember the shorthand because these names are the same as the the variables that they are accepting, we can just put the variable itself in a curly bracket. So we can open and close this SVG and then close the outer div. And now we have basically a chart container that for now is 400 pixels high and however many pixels wide the screen is wide. Now we can go ahead and create our scales.

[18:20 - 18:38] Scales are a way to map raw values to physical points on a canvas, right? Instead of zero to 10, which is the range of potential happiness scores, we want to go from zero to width, where width is 514 or for your screen is going to be a different size.

[18:39 - 19:15] We want to take advantage of that entire potential space. And what we're going to use to do that are scale linear and scale band. And scale linear is going to be for quantitative data. And scale band is going to be for categorical data. Now what does that mean? Well quantitative or numeric you can think of as any number input, right? That's simple enough. Scale band takes in categorical or string variables in or inputs instead of numeric inputs. And that's the key difference between these two. Let's go ahead and start with the easy one, which is going to be our X scale.

[19:16 - 20:25] We're going to create a new scale, call it X scale. And we're prefixing it with this dollar label. Because as you might remember, inner width is also instantiated with a dollar label. And as you'll see soon, this is the variable that we're going to reference within the scale. So this is going to be a scale linear with a domain, which you may remember is the input and a range, which you might remember is the output. Now the output is easy. We're going to go from zero to inner width, because as I said, we want all of our points to occupy from zero to the end of our screen, minus the margins. But the domain is a little bit harder. The question is what do we want the domain to be ? You could dynamically generate the domain for your X scale by basically finding the minimum and maximum value of happiness within our data set. Or you could just work smarter, not harder, and know that it's roughly between zero and 10. And in my findings, I found that there's no countries under one or over nine. So I'll just make it one and nine. Okay. But if you're here, because you want to know technically what's proper, you would probably import extent from D three array.

[20:26 - 21:18] And then what you would do is pass extent into the domain. And you would look within the data array. Remember that data is this guy right here with all of our countries within. And you would pass an accessor, which would be D dot happiness. So basically, I'm looking for the minimum and maximum of happiness within the data array. You could pass this extent in, or you could just hard code it like I want to. So we now have an X scale. And what we can do if we want is pass this in to the above simulation. But I'm going to wait until we do the y scale, just so we can do it all at once. Let's create a new y scale, which is not a scale linear function, but instead a scale band. And the reason for this is because we want to map these categorical variables known as continents.

[21:19 - 22:58] And we want to map those according to positions on the canvas that are basically equally spaced from one another. But basically, we want this to be a scale band function, which again takes in the exact same parameters, a domain, and a range, where domain is input and range is output. But here, the range is quite simple. Just like last time, it's going to be inner height to zero. And you might remember this is because the coordinate system is flipped in SVG. But then the domain is harder, right? So what do we want? We effectively want to get a list of each of the continents present within our data array. So Asia, Africa, all the way down finding each continent. Now you couldn't manually go through and find these and then write them one by one in an array. It's probably not the best way to do it. The safer way to do it is map through our data array , where map basically retrieves one element per item in the overall array and return the continent variable. Now this will basically return a list of continents that can then be used as the domain. And based on their positions in this domain, passing that value into the y scale will return its numerically equivalent position in the range. Sounds pretty confusing. Let's go ahead and console log y scale of Asia and see what happens. If I refresh this, you'll see 316.6. What would happen if I passed Africa, you would see 253.3. And if I passed North America, you're going to see 63.333.

[22:59 - 24:25] You'll notice that these numbers are pretty similar in that they're actually multiples of one another, because like I said, they're equivalently positioned within the same range. Okay, so we know that our y scale is in fact working. So finally, we have our scales. That was quite fun. We're actually going to change this code a bit in a few seconds. But for now, let's go ahead and sub in our data to see how close we are to where we want to be. So up in our simulation, we'll remember we're passing these x and y values. The issue was that these were unscaled. So just to review, before we make this change, within the simulation array, we had these raw numbers for x, like 2.4, which is way too small. And we had not a number for y, which meant that this just wasn't working at all. So now we're using x scale and y scale to fix that issue. I'll pass d dot happiness into x scale, and d dot continent into y scale. And then I'll hit save. Now immediately, you're going to see a bug. And the bug is that x scale is not a function. Now you might be saying, yes, it is, I literally instantiated it right here. What do you mean? It's definitely a function. Notice how x scale is instantiated with the dollar label. This is going to be an important pattern to pay attention to is how we declare variables in our spell applications. Here, x scale is reactive.

[24:26 - 25:27] It is instantiated with a dollar label. So whenever simulation is created at basically application runtime, when the page first loads, x scale doesn't actually exist yet. And this is just the internals of how it works, how it declares reactive variables. So if we want simulation to actually reflect x scale properly, it needs to be created with the same method, in this case, with a dollar label. Now if we go ahead and hit save now, we'll see that that issue might get solved. But there's another one, which is that simulation is undefined, which is because right on 28, we are console dot logging simulation. But for the same issue as prior, that doesn't exist yet. Let's prefix this with a dollar label as well, and hit save. Our simulation is now working. If you look at the console log, we can verify this by opening any of the objects, and looking at the vact, vy, x and y variables, recall that previously, they were super small numbers and nn, which stands for not a number.

[25:28 - 26:17] Now, they are properly scaled numbers, 295, and actually a number 61.9. Therefore, our scaling is in fact working. We've created scales that do in fact, transform our raw data, both textual string based data and numeric data into positions on a canvas. So for our final step of this lesson, let's go ahead and render these circles using their newly scaled values. And we 're going to create a g element that accepts the inner chart. And let's call it inner charts that we can remember this. And within this chart, we want to transform its position to the right, the equivalent of margin dot left, and down the equivalent of margin dot top. We're going to open and close this g element.

[26:18 - 26:38] And the question is, what do we want within the each block? What do we want within the inner chart? And the answer is all 100. And however many circles it is 146 circles rendered according to their x and y position. So we can go ahead and write each simulation dot nodes as node.

[26:39 - 30:09] Or for the sake of simplicity, we can go ahead and replace this with a variable. We could say nodes is equal to simulation dot nodes. Then in our each block, what do we want? We want one circle per node with the following properties. We wanted to have a Cx of node dot x, a c y of node dot y, in R of radius, and a fill of steel blue, because that's a much prettier color than basic red. Let's go ahead and save this and then end our each block. Now, if we save and refresh, you'll notice that the circles appear in the top left corner. And then on resize, they in fact appear in the right place. For now, this is sufficient. What we've done is we basically positioned each of our circles according to their x and y position. Now, at runtime, these nodes are in fact undefined, which is why it's creating this weird positioning in the top left corner. But we're going to fix that in the next lesson. For now, what we've essentially done is we've created a properly scaled chart that uses physics to position these B swarm elements. But there's one other thing we want to fix here. And that is the order of these continents. So right now, the order is just passed in randomly according to the order of data in the data array that we passed. Now, what does that mean? Effectively, what the continents array looks like is a list of continents as they appear in the data. So right now, it's something like Asia, Africa, and whatever the next continent to appear would be third and list, etc, etc, etc. We probably don't want to do that. We probably want to order these continents according to some meaningful variable. A good example would be like the average happiness score of each continent. And that way they'll appear within our axis more properly, like in order to look visually, as you would expect as you look at a chart to see these in order. And the way that we're going to do this is a very complex function that I do not expect you to understand. That looks like this. And you notice that I've commented in a few different places, so it 's more clear what we're doing. But we're basically generating the average for each continent so that we can sort according to that. We're using D three arrays functions, roll ups, and mean to basically group the data by continent. And within that grouping, find the mean of happiness scores. Then after we do that, we're sorting so that those with the highest happiness scores are at the top and those with the lowest happiness scores are at the bottom. And then we are mapping and retrieving the actual continent name as a value. So if I copy this continents array that we just created and paste it here, you'll notice that the data updates slightly. And now these are actually, you know, sorted according to average happiness value. The last thing that I want to do is add some padding on the top and bottom of this chart. And so I'm going to use an additional function or method within scale band, which is called padding outer. It's going to have a padding outer of 0.5. You can change this as you see fit, but you'll notice that it now adds some space between the top of the chart and the circles. So there's a bit more breathing room.

[30:10 - 30:24] There's also padding inner, which you could play around with, but padding outer meets the objective of this basic chart. So we have a chart, but definitely not perfect. It starts off in this top left corner. It fixes on resize, but the resize looks kind of choppy.

[30:25 - 30:32] It looks kind of ugly. So we're close. And the purpose of this lesson was just to get set up drawing circles with physics.

[30:33 - 30:43] In the next lesson, we're going to make these circles reactive. So they have a bit more momentum. The physics based layout is more intuitive. And this initial rendering issue is not present.

[30:44 - 30:48] (upbeat music) (upbeat music)