Today, I will be explaining the steps I took to create my most recent project: Real Estate in Some U.S. States Since 2010.
Preparing the Dataset
The dataset that I put together for my project was adopted from Zillow.com’s research database. The database includes various files, each with one metric divided by region, city, state, metro, country, zip code or neighborhood. In my case, I chose the Metro category. From the bottom of the page, I downloaded all the Zillow metrics through a single file.
Then, I began to look at each individual file and thought about what questions I’d like to answer. Mostly, I was interested in home values, sales, and rents. I dragged all the files that I wanted to include in my dataset from the data source to a new folder. After that, I opened each of the Excel files and noticed that the data was inputted in a different format than what was needed for Tableau analysis. Each Excel file only contained what would become one measure, and I had to find a way to put all this information together within one Excel worksheet.
So, for each file, I decided to simply use the date and the region as the identity columns. I split the date column into the month and year, since it included both in one. Then, I decided to delete the month column and simply use the year as the values for the months in each year were the same throughout. Then, I used Pivot tables in order to aggregate the data for each year and region. I removed the grand totals for the table, and then I used the Tableau add-in in order to reshape the data for Tableau analysis.
Once each of the files were reshaped in a way that would be suitable for the software, I compiled all the Excel files into one workbook. Then, I used the Microsoft Query Wizard to join all the files together according to date and region. Once I was done, all the measures were ordered accordingly with each corresponding dimension and I was able to just delete the repeated columns. Finally, my data set was ready for analysis.
Once my data was uploaded to Tableau, I began to play around with the measures and dimensions and the different types of visualizations. I started looking at the data for different patterns, and maybe even try to hypothesize some relationships that may exist between my measures. At first glance I thought there would not be much to draw from this data source. However, I began to ask myself different questions which led to me create the Story to organize my thoughts.
What is the most important aspects of Real Estate? How has sales and rents changed over the years? What are the most expensive regions? What are some price differences across regions? Which states have more homes for rents? Is this related to the amount of houses being sold? How do prices relate to houses being sold?
So, I created a Storyboard with the following headings, each with their distinct purpose: Overview, Prices, House Sales, House Rents, and Conclusions.
I kept in mind the theme of the storypoint as I created each dashboard, which guided me to look at patterns within each measure. This also helped me draw new questions about possible patterns that later were included as part of my conclusions.
The First Dashboard: Overview Property Values
The Overview part of the project was simply to give an overall picture about the changes in the overall United States over time. I used a map in order to highlight the Regions that would be looked at throughout the project, and also to call attention at the possible patterns that would be explored later. I included a slider filter in order to give the end user the ability to go through each year and see the changes for each particular measure highlighted.
From 2010 to 2015, it is clear that some houses have continued to decrease in value, but the value continues to decrease. As well, the number of houses increasing in value has substantially improved. The listing prices overall have continued to increase as well as the average sold prices. This demonstrates an overall growth of the market at first glance.
The second dashboard was composed of a bullet graph, a chart, and a line graph. I chose the bullet graph to compare the states with each other and also to call attention to the differences between the Listing Price and the Sold Price. The chart included a quick filter with a dropdown to also allow end users to pick the state and see the differences in prices over the years. I also put an action filter on the bullet graph so they’d be able to click on the graph and also filter according to that state. Finally, the line chart was simply to show the changes over time for both the Listing and Sold Price.
From this particular dashboard, you can deduce that most houses sell for less than the listing price, even though there are some exceptions like Washington, DC. Over the years, both the sold and listing prices have overall continued to increase.
In House Sales, I chose to use a line graph to show the changes in the percentage of houses sold over the years. This graph also included a filter capability so that the user would be able to see how each state did over the years. Then, I used a bar graph to compare the states to each other and used a filter to allow the end user to observe how each state did at that point in time. Finally, I used a bar graph to show which regions sold the most houses overall.
According to my findings, there was a pattern of increase and then decrease across house sales over the years. However, this did not particularly apply to all states since there were some states that have experienced an overall increase. As well, Nevada has had one of the highest house sales since 2010. Finally, the West is the region that has the most house sales.
In House Rents, I took a similar approach to that of House Sold. I used a line graph to analyze home rents over the years and also a filter to allow end users to see each particular state. Then, I used a bubbles graph to show which states have the most houses for rent and a filter to allow end users to see which states have had the most houses for rent at a particular point in time. Finally, I used the Pages shelf to show an animated version of the overall changes of Rent to Price Ratios over the years.
Through this dashboard, we learned that the number of homes for rent overall has continued to increase over the years. Interestingly, California also had the highest amount of houses for rent since 2010. The Rent to Price Ratio went up and down almost for every region since 2010.
In the conclusions section of the Project, I wanted to answer all the questions concerning possible patterns across measures. Overall, I found out that in all the regions the houses sold for less than the Listing Price. The Pacific is also the most expensive region with the Midwest being the cheapest. When I thought about house sales, I also assumed that if the houses were cheaper then more houses would sell. However, there is no correlation between the Sold Price and the percentage of house sold. Even when houses sold less or more, they still sold at an average price. Concerning Rent, I thought that if more houses were purchased then there would be less people willing to rent, however, there seems to also not be a correlation between the number of houses sold and homes for rent in the market. This may mean that despite house sales, there still remains a demand for rent. Finally, the Price to Rent Ratio does have a positive correlation with Sold Price, which means that houses are sold at a higher price and are increasing in value compared to rent.