I love to travel, and I am sure I am not the one who is curious to see new places, new cities, and new cultures. Founded in 2008, the online marketplace Airbnb has helped a lot of people to afford their trips. With Airbnb, you can rent a shared room, private room, or an entire place/apartment for a night or few, or for a month if needed. Living in New York City, I am not only interested in prices Airbnb offers, but also in the option of listing my own space. That is why when I came across Airbnb data in Kaggle.com, I immediately downloaded it. I loved the opportunity to analyze given data myself, and to check the angles I am most interested in.
This project would be interesting for anyone who likes to travel, who wants to visit New York City, and also for those who live in New York City and wants to make some extra money out of spare room. What are the most and the least expensive neighborhoods of New York City? What is the average price per night in the neighborhoods I want to stay? Is there any relationship between the number of listings per host and his/her average price? Let me walk you through my project and we will answer these questions together.
I started my analysis with cleaning and filtering data in Excel. The data included over 47000 rows from 2011 to 2019, so for easier and more relative analysis I filtered the data in Excel by the last review, and chose the year of 2019 for my project. The data now consists of a little more than 25000 rows. After taking a look at the data, I had a vision of some visualizations right away. The data points are Id, Name, Host Id, Hostname, neighborhood, neighborhood group (borough), latitude, longitude, room type, price, minimum nights, number of reviews, last review, reviews per month, availability 365. Longitude and latitude measures give me the ability to create a map, where the viewer will see prices and amount of the listing by the neighborhood. I think the map is a great tool to explore the most popular/favorite neighborhoods for those who live in New York City as well as for those who come to visit.
My first visualization was a map of the neighborhoods of New York City sorted by average price using color. To show the differences in pricing I chose sorting the average price by color, where the color panel goes from pale orange-gold for cheapest prices to dark red for most expensive neighborhoods. I like how visually these colors go together. Emotionally, the dark red color represents “danger” or “alert”, and, in this particular map, lets the user know which neighborhoods are the most expensive ones. Also, orange-gold represents money or treasure and might associate with the desire of being able to own it. And, in this map, prices that colored in orange-gold are the most affordable. When I made the map, I realized that the darkest color falls on one dot at the waterside in Brooklyn, and all other dots have a very similar lighter color. After filtering and exploring data, I found out that one out of two listings in Sea Gate (Brooklyn) costs 1485 per night, which makes the average price for neighborhood over $800. Even though there are more expensive listings in NYC (up to a few thousand dollars per night), the average price of the neighborhood much lower than $800. The average price per neighborhood without Sea Gate varies from $38 per night up to 396.7$. Since there were only two listings in Sea Gate I made a decision to exclude that neighborhood for a better overall picture. In tooltip, I added a simple explanation of the map and its points, where “Average price in <Neighborhood> is” is printed in font size 10 (name of the neighborhood is bold), and average price in size 12. I think using different fonts in tooltip helps viewers to pay attention to the most important parts.
Next two visualizations are a closer look at the map of average prices. Since there are more than 200 neighborhoods, I decided to make two bar charts with the most and the least expensive neighborhoods. I did not know how many neighborhoods per each chart I wanted until I started to sort the average price. For the first group, I chose the average price per night from $250 to $396.7, where I ended up with seven the most expensive neighborhoods. For the second group, I filtered the price from the minimum of $38 up to $50 and got 6 the least expensive neighborhoods. For people who do not know all the names of the neighborhoods, I added the name of borough in prentices the tooltip. I used the same colors as in the first map because it explains the same values.
The idea of my next visualization was to show the variety of hosts and their prices all over New York City. I made a chart where the x-axis is a Host Id and the y-axis is the average price per night. There are thousands of hosts’ names, but since a lot of hosts use their first name only, this repetition might confuse the research. I had to use Host Id to make sure each listing has its own value and add Hostname to the tooltip. After getting some feedback on this chart, I realized that it confuses the viewer, and not everyone can get out of it information I wanted them to get. So, I decided to include in my dashboard only the highest and lowest prices to show how huge is the range in between those two prices. The highest price for a night is $5,100 and the lowest is only $10. Unfortunately, the data does not provide a lot of information about hosts, so since I cannot tell my viewer why this particular person decided to list his/her house/apartment for certain price, I left the Hostname out and displayed the highest and lowest prices, room type and the neighborhood where those two units located.
My research continued with finding the median price and the number of each room type in every borough (neighborhood group). In order to join together this data in one visualization, I started to look for the perfect chart. The suggestion on www.labnol.org to show the relationship between three variables was a Bubble Chart. It seemed like a perfect fit for my idea. I created a bubble chart, where the y-axis is a Median Price, boroughs and room type are on the x-axis, the number of rooms is filtered by the size of the circles and room types marked with different colors. The colors I chose for this dashboard are pale turquoise, grey and dark blue. The dark blue I assigned for Shared room type, because there is the least amount of those rooms and the overall picture would not be so dark. Also, all of these colors are on the same color scale, and all three colors are “cold”, which brings the harmony of the overall view.
Using the same colors for room type, I decided to make my own interactive Airbnb project where users can choose the type of room, its location and the price per night they prefer. Listings in the map are shown as little circles, and, even though they overlap each other, I did not change opacity. 100% opacity for these circles makes them more visible when the user chooses one room type out of three. Changing opacity does not show clearer the number of listings on the map, so my final decision was to leave it with 100% opacity.
Initially, I chose two search parameters: room type and minimum price, where for “Choose room type” parameter I calculated a new field called “CALC Dynamic room type” that has a formula of CONTAINS ([Room Type],[Choose room type]). As a result, CONTAINS function gives the value of the first variable even if it contains one word out of two. For example, for searching “Entire room” the user can type in only the word “Entire”.
For the minimum price calculated field, I had a formula [Price]>= [Choose MIN and] to get any price higher than the one that the user chooses, including the chosen. Feedback at pin-up in a class showed that viewers want to be able to choose the range of the price. After trying to make a range and a list of dropped down menu, I thought that the best option for a price per night would be to make two different search parameters with MIN and MAX price, where the result of using both of these parameters show the price range in between two prices chosen by user, including both.
I also heard a suggestion of making a dropped down menu for room type research. But the idea was to make this dashboard as much interactive as possible, where room type is needed to be typed into search parameter by viewer. For a user to know exactly what to do I put instructions above the search parameters and the map with a note of *Case sensitive. I made a connection between all four search parameters and the visualization of median price by borough, so the user can see how many rooms of the exact type and for the exact price are located in the borough he/she wants to stay. The tooltip in the interactive map was designed with the purpose of inviting my viewer to book the chosen room: “<Host Name> has <Room Type>for you at $<MEDIAN(Price)> per night”.
All these visualizations needed an introduction. I wanted to show to my viewer the overall picture of Airbnb in New York City first, for instance, how Airbnb has grown for the past few years. However, the initial data from 2011 to 2019 was already filtered by 2019 only. So, I made a decision to create a bar chart in Excel and to insert it into my first dashboard as a picture. The bar chart has the number of last recorded reviews per year as a y-axis and years on the x-axis from 2011 and 2019. I labeled every bar with a number of reviews, so the user can clearly see how popular Airbnb has become for the past years.
To show the variety of listings, average prices, and the relationship between them, I made a tree chart and a map by neighborhood. The idea of treemap came up after I asked myself a question “Does the average price of the listing depends on the amount of the listings host have?” To see the relationship between the number of listings by the host and the average price, I filtered the number of units by the color and the average price by the size of the square. The color palette goes from light beige for the least number of listings to dark brown for the hosts who have the most number of listings. This treemap clearly showed that the average price per host does not depend on how many listings this host has. The highest average price of $3,613 belongs to the host named Sally, who has 12 units. The most number of listings – 327 – is hosted by Sonder (NYC) while his average price is only $270 per night.
There are thousands of hosts who have only 1 or a few listings, so, for the previous visualization and a map of listings by neighborhood I made a decision to filter the data by the number of listings. The hosts on these two visualizations have 10 or more listings. On the map by neighborhood in this dashboard, the viewer can see the concentration of listings in certain neighborhoods, which gives him/her the ability to see how many units are available in this area by a particular host.
My choice of colors falls to beige-brown palette for a few reasons. First of all, brown color is assigned for the most number of listings, because emotionally and visually this color is “heavy”. Second, this palette is made of “earthy” colors which cannot irritate the human eye and makes it very easy to perceive it.
Every dashboard has a notation at the right bottom corner consisted of three points:
*Data filtered by last review, 2019
*All prices are shown in American dollars
*Data source: Kaggle.com
By looking at the analysis we can clearly see that the most popular room type for Airbnb in NYC is an entire room/apartment. The most expensive areas are Tribeca, Flatiron District, NoHo, and SoHo; all of them located in lower Manhattan. If Manhattan has been always the most expensive borough to live in, surprisingly, there are few spots in Staten Island and Rockaway beach that very close to the average Manhattan price. This fact proves one more time that all of us are unique, and so many men, so many opinions: to each his own way.
I hope you enjoyed my project and found it useful. Good luck with the search for a perfect room!

