In near future, drones (unmanned aerial vehicles, see the image above) may carry out routine delivery tasks. To get an idea of the concept, have a look at Amazon Prime Air web pages.
Your goal is to find ideal locations for a set of drone depots, based on the coordinates of the clients.
Load the generated data set. Each row starts with a customer identifier, followed by the geographical locations of nearly 6,000 clients.
Material | Link | Reference |
Data set | csv |
Visualize the client locations by making a two-dimensional scatterplot.
Using k-means clustering, find optimal locations (i.e. x and y coordinates) for three drone depots. Each depot serves its surrounding clients.
Hint: The centroids serve as the depot locations. You will later need to change the number of depots, so design your program in such a way that you just need to modify a single value to do that.
Attach the information on the closest depot to each client. That is, generate a data frame that is similar to the original one with the exception that it has an additional column that contains the identifier of the depot nearest to the client. Print the first 10 rows of the new data frame.
Make a scatterplot that uses three different colours. The markers with the same colour are served by the same depot.
Tips: Re-check the web page mentioned in the first task.
Play with the number of depots. What are the optimal locations for 10 depots, for example? Do you see a difference in the computation time when the number of depots increases?
Replace k-means with agglomerative hierarchical clustering and explore it with various depot numbers. What are your observations?
Back to main page