ARC6818 Elements of Computational Design Jessica Alcantara Rivera
import turtle
loadWindow= turtle.Screen() turtle.color(LightSeaGreen) turtle.speed(0)
for i in range (200):
turtle.circle(-3*i,180)
A-2// Interactive game, Predator and pray // (See appendix 2 Jupyter Notebook)
The goal of the assignment by Mark Meagher.
a. To create one function that expands the range of available turtle behaviors. Create two types of turtle behavior: targets that repel or attract the other turtles, and worker turtles that move toward or away from the targets.
b. Description of the code
This code aims to create three types of behavior for the elements in play. The first behavior: prey defines a turtle as the target, this target can be controlled with the up, down, left, right keys. The second behavior: predator is derived from the first one, the predator identifies the angle of a direction of the turtle with turtle.heading and creates with turtle.towards a path towards the target turtle that is in motion. -The third behavior: turtles food defines an object that instead of approaching the turtle, flees from it.
The main challenge of this assignment was to go into the documentation of the TurtleScreen functions for turtle module. (1), also to learn how to use screen events as turtle.onkey. Here I discovered too that the turtle also can be controlled by a mouse-click. This can be interesting for further projects.
Reflection on the work.
The research I made to develop this simple game, helped me to understand that the turtle can be controlled by keyboard keys, what makes more interactive the drawing. As you play you draw over the white canvas.
Question to address during the code development:
How to implement turtle.distance, turtle.heading, and turtle.towards functions. ?
(1) https://docs.python.org/2/library/turtle.html#methods-of-turtlescreen-screen-and-corresponding-functions
from turtle import Turtle, Screen import random
playGround = Screen() playGround.screensize(500, 500) playGround.bgcolor(white) playGround.title(Turtle persecution)
#this turtle is the target of the predator run = Turtle(turtle) run.speed(fastest) run.color(royalblue)
run.penup() run.setposition(-500,0)
#this is the predator
follow = Turtle(circle) follow.speed(fastest) follow.color(gray) follow.pendown() follow.setposition (0,0)
#this is the food of the turtle and also runs away follow2 = Turtle(triangle) follow2.speed(fastest) follow2.color(lightsteelblue) follow2.pendown()
follow2.setposition(0,0)
#define the movement of the turtle by key controls def a1():
run.forward(25)
def a2(): run.left(30)
def a3(): run.right(30)
def a4(): run.backward(25)
def quitThis(): playGround.bye()
#define actions depending on setheading def follow_runner():
follow.setheading(follow.towards(run)) follow.forward(5)
follow.begin_fill
follow.circle (5)
follow.end_fill playGround.ontimer(follow_runner, 50)
def follow_runner2(): follow2.setheading(follow.towards(run)) follow2.backward(10)
follow2.begin_fill
follow2.circle (5)
follow2.end_fill playGround.ontimer(follow_runner2, 50)
playGround.onkey(a1, Up) playGround.onkey(a2, Left) playGround.onkey(a3, Right) playGround.onkey(a4, Down) playGround.onkey(quitThis, q)
playGround.listen()
follow_runner() follow_runner2()
playGround.mainloop()
ECD Jessica Alcantara Rivera 2017 12
A-4// Titanic Database // (See appendix 4 Jupyter Notebook) The goals of the assignment by Mark Meagher.
a. To go through the preliminary steps of evaluating the Titanic passenger dataset.
b. To formulate hypotheses that can be used as a basis for querying, restructuring and understanding this dataset.
c. To use histograms or any other type of plot that allows to investigate the hypotheses and communicate the results.
Reflection on the work.
There is a wide variety of tools and libraries that can be used to represent and plot the data from a database in a visual form. In this case, the selected library to work is Matplotlib, a Python 2d Plotting library with a broad amount of applications.
In this assignment and with all the research that its development needed, I learned that there are three important elements to keep in mind while analyzing a dataset via visual representations. The first one: it is substantial to have questions, to make a hypothesis, and to draw possible relationships from the data in a preliminary stage. Not doing this will lead to wasting time, and obtain not relevant and disconnected conclusions. The second one: It is necessary that the tool of representation allows establishing real visual connections between the selected data. So it is relevant for the workflow to decide consciously what kind of tool is the best option for each database. The third one: paraphrasing David McCandless 2010(1), The value of using visualization tools to analyze datasets it is the opportunity to find through visual recognition of patterns connections that through another method would be impossible to identify.
(1) https://www.ted.com/talks/david_mccandless_the_beauty_of_data_visualization/transcript Description of the dataset with 891 entries:
pclass / Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd), integer survival /(0 = No; 1 = Yes), integer
name / Name of the passenger, string
Sex/ Sex, string
Age/Age, integer
Sibsp/ Number of Siblings/Spouses Aboard, integer Parch/ Number of Parents/Children Aboard, integer Ticket/ Ticket Number, string
Fare/Passenger Fare, integer
Cabin/Cabin, string
Embarked/Port of Embarkation(C = Cherbourg; Q = Queenstown; S = Southampton), string.
Purpose of the study, questions to address and selection of relevant data:
The purpose of this study is to graphically analyze the probabilities of survival of men, women, and children in the Titanic according to their social condition and gender. The main objective is to discover if the most famous phrase on the story Women and Children First was fundamentally respected and the gender and age influenced the survival or if it was a mere romantic legend. The final question would be then: Women and Children First or Every Man for himself?
According to this question, and looking into the available information from the data set it is possible to decide what columns are going to perform a fundamental part of the analysis. In this case, the selected ones were Survived, Pclass,Sex, Fare As shown in the description, some of the information was retrieved in determined formats such as strings, conversions from strings to integers are needed.
The first action on this analysis was to take all the available data from this selection and to make a cross-reference. To understand the dataset in a graphical way, as shown in the next two scatters.
PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
1
0
3
Braund, Mr. Owen Harris
male
22
1
0
A/5 21171
7.25
S
2
1
1
Cumings, Mrs. John Bradley (Florence Briggs Thayer)
female
38
1
0
PC 17599
71.2833
C85
C
3
1
3
Heikkinen, Miss. Laina
female
26
0
0
STON/O2.
310172.8925
S
4
1
1
Futrelle, Mrs. Jacques Heath (Lily May Peel)
female
35
1
0
113803
53.1
C123
S
5
0
3
Allen, Mr. William Henry
male
35
0
0
373450
8.05
S
6
0
3
Moran, Mr. James
male
0
0
330877
8.4583
Q
7
0
1
McCarthy, Mr. Timothy J
male
54
0
0
17463
51.8625
E46
S
8
0
3
Palsson, Master. Gosta Leonard
male
2
3
1
349909
21.075
S
9
1
3
Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
female
27
0
2
347742
11.1333
S
ECD Jessica Alcantara Rivera 2017 13
The purpose of this study is to graphically analyse the probabilities of survival of men, women and children in the Titanic according to their social condition and gender.
Women and Children First or Every Man for himself?
To create a new column named Gender taking the
values from the sex column transforming them to integers 0 or 1.
To create a scatter plot using the age values for the x and the fare values as the y, assigning different colors for each Pclass. This part of the code represents the scatter blue for Pclass 1.
The first scatter shows the total distribution of the Titanics population based on fare, class, and age.
The second scatter shows only the survivors.
By visually comparing both scatters is possible to recognize the mass-losing in certain areas. What it leads us to a series of observations:
a. The segment that recorded the highest amount of losses was third class if we observe the range of ages we can see
that people from 35 years to 50 died while people from 15 to 35 years survived.
b. In relation to childs survival, the general massing of dots remains similar in the first and second chart except for child
belonging to third class from 3 to 15 who was more vulnerable, this may be related to the women who took care of them and had fewer chances to get into a boat.
According to the first set of observations and questions to address, there was the urgent need of dig deep into the gender composition of each class to test the initial hypothesis. But first, it was necessary to work with the structure of the dataset as is shown next.
ECD Jessica Alcantara Rivera 2017 14
Working with Pandas
Pandas is a library for manipulating databases. In the case of this project, Pandas module was used in first place for the creation of the gender column. In second place for filling the gender column with numerical values converted from strings. In third place to generate very useful subgroups of information such as survived and died based on known numerical values retrieved from the database.
With the use of Pandas module, the database was more manageable and it was possible to use only the needed information. I created four sub-groups of passengers for analyzing this dataset, which later I plotted with different scatter characteristics as seen in the code below: survivedF, for women survivals, survived M for male survivals, diedF for women deaths, diedM for men deaths. The idea was to go deeper into the gender composition to understand its link with the probabilities of survival and to find visual patterns in the output that would help to confirm or reject the initial hypothesis.
ECD Jessica Alcantara Rivera 2017 15
Working with Matplotlib, Scatter.
A scatter diagram is a visualization format designed to plot elements with values for the X axis and Y axis retrieved from a data set.
For this series of visualizations, I decided to plot the subgroups defined with Pandas module generating visual relationships between Age for the X and Fare for the Y, for each Pclass.
Scatter diagram format allows changing the marker, color, alpha and size of the mark.
In this example, I plotted in the same scatter all the subgroups of Pclass 1.
Survived Woman with a . yellow dot marker. Survived Man with a . blue dot marker.
Died Woman with an x yellow cross marker. Died Man with an x blue cross marker.
The image below is the output of the first scatter. The description of the pattern recognition and the conclusions of the study can be found on the next page.
ECD Jessica Alcantara Rivera 2017 16
How related was survival with class belonging and gender in first class?
This scatter brakes my initial hypothesis.
I thought that at least in first class, most of the rich men were supposed to survive. But the reality is unexpected: several wealthy men died, while the closest to second class in terms of fares, survived. On the other hand, there is a prevalent amount of yellow dots against yellow crosses.
Here the order Women and Children First was fulfilled.
Being a woman in the first class was synonymous with survival.
How related was survival with class belonging and gender in second class?
Second class is the graphical demonstration of the order Women and Children First
Basically, the only red dots are the ones that represent children. While there is an astonishing appear of red crosses on all over the canvas.
Being a woman in first class also was synonymous with survival.
How related was survival with class belonging and gender in third class?
The analysis of third class shows that: Being a woman or child in the third class was not a guaranty of survival even paying the higher rates of the third class. Also, we can understand that from third class the most vulnerable people were men from 20 to 45 years and teenagers, this can be seen in the concentrated pattern on the bottom and on the concentrated pattern on the left.
Conclusions: There is a strong relationship between survival and gender in the case of the woman in 1st and 2nd class. This rule doesnt apply to the woman in the third class. Women and Children First depended on the wealth. Belonging to first class didnt guarantee to men the right to survival. Most of the richest men died.
An interesting pattern that pops to the eye, is that a big amount of teenagers on board died regardless the genre, most of them belonged to the third class.
ECD Jessica Alcantara Rivera 2017 17
Mexico City/Taxiroutes from 24/06/2016 to 20/07/2017 Data visualization by Jessica Alcantara Rivera ECD
ECD Jessica Alcantara Rivera 2017 19
A-5// Mexico City/Taxiroutes from 24/06/2016 to 20/07/2017// (See appendix 5 Jupyter Notebook)
The goals of the assignment by Mark Meagher.
a. To chose a data source of personal interest from the internet and to download a dataset.
b. To formulate hypotheses that can be used as a basis for querying, restructuring and understanding this dataset.
c. To select a visualization method for exploring the dataset and for answering the initial questions.
Development of the work and questions to address.
I used a database from Kaggle under the name Taxi Routes of Mexico City. According to the source, data was collected from Taxi, Uber and Cabify trips using EC Taximeter over a period of 13 months from 24/06/2016 to 20/07/2017. The database contains 12653 taxi routes.
This is a very interesting database that can be used to have a quick and general understanding of the needs of transportation in the city that are not covered right now by public transportation. Therefore, the analysis could be used later to implement new lines of public transportation as metro, bus, or metro-bus.
The interest of the visual representation of this dataset relies on knowing more about how peoples displacements in Mexico City behave over time. I want to map data from Pickups and Drop Offs over this period of time and to discover relationships between the information.
Initial questions to address:
1. What are the most commons taxi routes by time?
2.Where are the most common areas for pickups?
3.Where are the most common areas by drop Offs?
4.What relationships can be established between time and location?
Description of the dataset with 12653 entries: https://www.kaggle.com/carlosknows/taxis-in-mexico/data
The data fields are very specific, the ones marked with (//) are the used in this study. Further improvements in the analysis can be made by considering other important fields such as the ones that contain Date Time.
/id a unique identifier for each trip /vendor_id the type of fare entered by the user (taxi, Uber,Cabify) /pickup_datetime date and time when the meter was engaged /dropoff_datetime date and time when the meter was disengaged /(//)pickup_longitude the longitude where the meter was engaged /(//)pickup_latitude the latitude where the meter was engaged /(//)dropoff_longitude the longitude where the meter was disengaged /(//)dropoff_latitude the latitude where the meter was disengaged /trip_duration duration of the trip in seconds /dist_meters the distance of the trip in meters /wait_sec the time the car was completely stopped during the trip or waiting time in sec.
ECD Jessica Alcantara Rivera 2017 20
The Python libraries/modules to use. Pandas, numpy, matplotlib.
First stage; exploring visually the dataset.
First visualization using two scatters, one for the pickups other for the drop-offs.
Also this is a quick exploration of the range of latitudes and longitudes that the dataset defines.
ECD Jessica Alcantara Rivera 2017 21
Second stage; to set the dimensions of the study
The first output allowed to understand the dimensions of the data set.
The result shows the city and its metropolitan area.
That is why is necessary to make a zoom into the data, and re-define the new boundaries in order to achieve a more accurate visualization.
For creating the bounding-box I used the next link http://boundingbox.klokantech.com/
latitude(-99.31, -99.0)
Longitude(19.24, 19.55)
and plt.ylim and plt.xlim
The plot shows very interesting facts:
There is some concentration of blue and red points in certain areas. That may be related to important cores of development of the city. For example at the center of the plot may be located the downtown, while at the north Indios Verdes which is a fundamental metro station that connects the entire north of the city with the suburbs. At the bottom of the plot should be CU which is the National University of Mexico, UNAM. At the left Santa Fe which is the new business core of the city and one of the challenges to address in terms of transportation for the coming years.
The plot is very clear in showing some general answers to some questions I formulated at the beginning.
2)Where are the most common areas for pickups?/ It seems that the Pickups (red dots) are more sprawled, so the pickups may occur at particular locations, as houses or another not- common places. Also, it appears to pop to the eye a peculiar situation, there are some concentrations of red dots, that are not related to concentrations of blue dots, this means, that people at this locations, only requires to being picked up, not dropped off, this may be a good question to answer later. It is the transportation system failing at some points at certain hours?.
3) Where are the most common areas by drop Offs? at first glance, it is very interesting that basically drop Offs gives an idea about the main streets of the city. Drop offs, may be related to main avenues, and cores of development. For example, Reforma avenue at center, also Constituyentes at the left side.
ECD Jessica Alcantara Rivera 2017 22
Third stage, to draw relationships between pickups and drop-offs.
The first thing I did at this stage was to think about how to visualize the relationships of the dots concentrations with core destinations.
So, I made a third scatter with the coordinates of metro-stations. This is a quick way of making spatial relationships with landmarks at the city.
Another strategy I used here to establish a relationship between the two groups of coordinates was to draw an arc and represent the connection between each pickup with its corresponding drop-off.
ECD Jessica Alcantara Rivera 2017 23
Fourth stage: conclusions from the visual output.
1. The final plot shows clearly the most important origins and destinations in Mexico City, the main idea for urban planning should be to try to
1260d0iminish the strength of the paths for the coming years by inserting strategic lines of public massive transportation systems that can connect this targets in an efficient way.
2.There are four main bright areas, three of them
are within the net of the metro. But the last one,
with coordinates closer to the point (-
99.27,19.33),Santa Fe is not connected by an
efficient and massive urban transportation. All the
displacements to this area must to be performed
by car or bus. This may be strongly related to the 11000
number of trips by taxi. Also, it is interesting to see that this area is mainly red what means that pickups happen often than drop-offs. Another feature of this zone is that the colored area is less compact than the other tree, this means that the area is growing in a sprawled manner. Without any clear and defined cores, this might be a potential problem later while addressing pedestrian-focused urban strategies.
10000
3.Very long trips are performed to or from downtown in a west to east or east to west direction. The target seems to be at (- 99.18,19.37), if I insert this coordinates in bing, it gives me the location of Chapultepec and Paseo de la Reforma very important destinations of the city in terms of connectivity. This shows the lack of connectivity in the X-axis in a public massive transportation system. Here would be very interesting to have a path displaying the origin and destination with different colors. It is a very good idea for further improvements.
9000
ECD Jessica Alcantara Rivera 2017 24
About the partially-answered initial questions.
1. What are the most common taxi routes by
time? /This question is partially answered because8000 the scope of the analysis did not include the time module. So the most common routes are known,
while everything about their periodicity is ignored.
2.and 3. See answers along the analysis.
4.What relationships can be established between time and location? / Further improvements may respond this question, by separating the source data in blocks of time, and plot them with different formats.
Further development is needed to identify visually the origin and destination.
Also it would be interesting to brake down the database by time to know the behavior of the paths over the course of the 24 hours.
6000
5000
4000
ECD Jessica Alcantara Rivera 2017 25
Reviews
There are no reviews yet.