Housing

Assignment 3 : Housing

Preliminary Data

The following infomation was found prior to the start of the assignment in order to use it.

tax hotels = 7% of turnover + 3 euros per room tax airbnb = 10% = 3 euros number of visitors = 30000

Library imports and DataFrame generation

import pandas as pd
import geopandas as gpd

csv_data = list()
listings = open("./data/listings_full.csv",'r', encoding='latin1')
for line in listings:
    data = line.split(',')
    csv_data.append(data)

listings.close()

df = pd.read_csv("./data/listings_full.csv", encoding='latin1')

1. What Amsterdam will receive from tourist tax if the event lasts a week and you will have 30.000 visitors?

Pseudocode

  1. Discard listings with minimum stays of more than 7 nights
  2. Sum the total ammount of people that fit the remaining airbnbs accomodate (#19812)
  3. Calculate average price of accomodation remaining listings
  4. for the remaining visitors (11188) they will stay in hotels –> average of 2.2 beds per room
  5. Divide the number of rooms between the remaining tourists
  6. Multiply number of available beds to average price
  7. Sum airbnb + hotels taxes accordingly
#Gathering information shortcuts:
 
#ok_listings = pd.DataFrame(df.columns)
#print(df.columns)
#print(df.columns[34])
#print(df.maximum_nights)

Code:

#we discard listings with minimum stays of more than 7 nights
ok_listings = df[(df['minimum_nights'] < 8)].copy()

#we check that all numeric values in our DataFrame sum
test = ok_listings.sum(axis=0,numeric_only=True)

#we sum individually the column 'accommodates' so we can retrieve the number of people who can stay 
no_ppl = ok_listings.accommodates.sum()

print('Number of Airbnb listings with at most 7 days of minimal nights :', no_ppl)
#print(test) success

tourists = 30000

tourists_hotel = tourists-no_ppl

print('Number of people who needs to stay in hotel rooms',': ', tourists_hotel)
#print(ok_listings)

#Average cost of overnight accommodation in Amsterdam in the Netherlands from January 2019 to June 2023
##https://www.statista.com/statistics/614061/overnight-accommodation-costs-amsterdam-city/

hotel_price_per_night = 294

#https://www.cbs.nl/en-gb/figures/detail/84040ENG
#Hotels; capacity, type of accommodation, beds, star rating

hotel_rooms_ams = 41840
hotel_beds_ams = 90918

avg_beds_per_room = hotel_beds_ams/hotel_rooms_ams

#print(avg_beds_per_room)

#Number of hotel rooms used by tourists who don't fit in airbnbs
no_hotel_rooms_used = tourists_hotel/avg_beds_per_room

revenue_hotels = no_hotel_rooms_used*hotel_price_per_night

tax_hotels = revenue_hotels * 0.07 + no_hotel_rooms_used * 3

#getting rid of $ signs in column
new_price = df['price'].str.slice(1,-1)

#adding new_price as a column in our dataframe
ok_listings['new_price'] = new_price

ok_listings['new_price']=ok_listings['new_price'].str.replace(',','')

#converting new column values to float so we can math it up =P
ok_listings['new_price'] = ok_listings['new_price'].astype(float)

#calculating total revenue of airbnb by summing all location price per night and then times 7 (number of days in a week)
revenue_airbnb = ok_listings.new_price.sum()*7

tax_airbnb = float(revenue_airbnb)*.1 + float(len(new_price)*3)

print('Total tax revenue from Airbnb locations $', tax_airbnb)
print('Total tax revenue from hotel rooms $',tax_hotels)

total_revenue = tax_airbnb + tax_hotels

print('Total tax revenue gathered for the government', total_revenue)
Number of Airbnb listings with at most 7 days of minimal nights : 23407.0
Number of people who needs to stay in hotel rooms :  6593.0
Total tax revenue from Airbnb locations $ 1469366.9000000001
Total tax revenue from hotel rooms $ 71543.25226687785
Total tax revenue gathered for the government 1540910.152266878

2. Plot the amount of AirBnB locations per neighbourhood.

Pseudocode

  1. Filter from dataframe the locations of airbnb
  2. Gather the different categories inside the column neighourhood
  3. Sum each category inside neighbourhood
#It can't get mpre efficient than this
import plotly.express as px
#table = ok_listings.neighbourhood_cleansed.value_counts()
re_index = ok_listings['neighbourhood_cleansed'].value_counts().reset_index().rename(columns={"index": 'neighbourhood', 0 : 'count'})
print(re_index )
                    neighbourhood_cleansed  count
0                   De Baarsjes - Oud-West   1329
1                             Centrum-West    973
2                  De Pijp - Rivierenbuurt    898
3                             Centrum-Oost    766
4                                     Zuid    571
5                               Westerpark    561
6                                 Oud-Oost    475
7                            Bos en Lommer    399
8                                Oud-Noord    389
9   Oostelijk Havengebied - Indische Buurt    304
10                         Watergraafsmeer    249
11                              Noord-West    212
12                IJburg - Zeeburgereiland    173
13                             Slotervaart    156
14                              Noord-Oost    135
15                 Geuzenveld - Slotermeer    110
16                  Buitenveldert - Zuidas     96
17                  De Aker - Nieuw Sloten     67
18                         Bijlmer-Centrum     47
19                   Gaasperdam - Driemond     45
20                                  Osdorp     43
21                            Bijlmer-Oost     37
#graph with information
figure = px.bar(re_index, x='neighbourhood_cleansed', y='count', title =  'Number of airbnbs per neighbourhood in Amsterdam')
figure.show()
Unable to display output for mime type(s): application/vnd.plotly.v1+json

3: Which street in Amsterdam has the most AirBnB apartments?

Pseudocode

  1. Find the locations of airbnb through lat and long.
  2. Add the street category to the DataFrame.
  3. Gather the different categories inside the column Street.
  4. Sum each category per street filter.
#pip install geopy

import certifi
import ssl
import geopy.geocoders
from geopy.geocoders import Nominatim, Photon

# Workaround to fix SSL certififcate expired error on some laptops
# Solution taken from https://stackoverflow.com/a/50665487
# Error message:
#   GeocoderUnavailable: HTTPSConnectionPool(host='nominatim.openstreetmap.org', port=443): 
#   Max retries exceeded with url: /reverse?lat=52.40164&lon=4.95106&format=json&addressdetails=1 
#   (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] 
#   certificate verify failed: certificate has expired (_ssl.c:1006)')))
ctx = ssl.create_default_context(cafile=certifi.where())
geopy.geocoders.options.default_ssl_context = ctx

geolocator = Nominatim(user_agent='geoapiExercises')

#print(ok_listings.columns)

#New dataframe with only 3 parameters
geolocs_airbnb = ok_listings[['latitude', 'longitude']]

#we erase all typos from the columns and then add '.' accordingly to their index position as to logical lon/lat values
geolocs_airbnb['longitude']=geolocs_airbnb['longitude'].str.replace(',','')
geolocs_airbnb['longitude']=geolocs_airbnb['longitude'].str.replace('.','')
geolocs_airbnb.longitude = (geolocs_airbnb.longitude.str[:1] + '.'+ geolocs_airbnb.longitude.str[1:]).astype(float)

geolocs_airbnb['latitude']=geolocs_airbnb['latitude'].str.replace(',','')
geolocs_airbnb['latitude']=geolocs_airbnb['latitude'].str.replace('.','')
geolocs_airbnb.latitude = (geolocs_airbnb.latitude.str[:2] + '.' + geolocs_airbnb.latitude.str[2:]).astype(float)


for i in range(len(geolocs_airbnb)):
    latitude, longitude = geolocs_airbnb.iloc[i]['latitude'],geolocs_airbnb.iloc[i]['longitude']
    location = geolocator.reverse((latitude, longitude), timeout=None)
    street = location.raw['address']['road']
    geolocs_airbnb.loc[i, 'street'] = street
   
geolocs_airbnb.head()
print(geolocs_airbnb.head())
   latitude  longitude                        street
0  52.40164    4.95106              Jisperveldstraat
1  52.38855    4.88521                Zoutkeetsplein
2  52.37824    4.86826    Centrale Groothandelsmarkt
3  52.34091    4.84802                    IJsbaanpad
4  52.37641    4.88303  Derde Egelantiersdwarsstraat
#saving harvested data inside another file for next tries

import pickle

fileObj = open('data.obj', 'wb')
pickle.dump(geolocs_airbnb,fileObj)
fileObj.close()

pickle.load

Geoloc test

Test of 10 locations and number of listings per per street

#print(geolocs_airbnb.street)
print('Number of Airbnb listings per street:', geolocs_airbnb.street.value_counts())
Number of Airbnb listings per street: street
Jisperveldstraat                1
Zoutkeetsplein                  1
Centrale Groothandelsmarkt      1
IJsbaanpad                      1
Derde Egelantiersdwarsstraat    1
Maassluisstraat                 1
Zanddwarsstraat                 1
Amstelveld                      1
Soembawastraat                  1
Keizersgracht                   1
Name: count, dtype: int64
import osmnx as ox
import matplotlib.pyplot as plt
import plotly as py
from shapely.geometry import Point, Polygon
import numpy as np

#here we set up a geodata frame based on the airbnb data set and do so with lat and long as geo location
points_airbnb = gpd.GeoDataFrame(geolocs_airbnb, geometry=gpd.points_from_xy(geolocs_airbnb.longitude, geolocs_airbnb.latitude), crs = "WGS84")
print(points_airbnb.head())

# distribution of airbnbs in ams
ams = "Amsterdam"
graph = ox.graph_from_place(ams)

#map with airbnbs locations
fig, ax = plt.subplots(figsize=(150,150))
ax.set_aspect('equal') #same scaling as data available
ax.set_facecolor('black')

ox.plot_graph(graph, edge_color='#edf2f1',
                        node_size=0, edge_linewidth=5,
                        show=False, close=False, ax=ax,
                        )
points_airbnb.plot(ax=ax, color ="#81dec8", markersize = 150)


plt.savefig("Q2_Q3_airbnbsAMS.png")
   latitude  longitude                  geometry
0  52.40164    4.95106  POINT (4.95106 52.40164)
1  52.38855    4.88521  POINT (4.88521 52.38855)
2  52.37824    4.86826  POINT (4.86826 52.37824)
3  52.34091    4.84802  POINT (4.84802 52.34091)
4  52.37641    4.88303  POINT (4.88303 52.37641)

4. Try to cross reference the data from the AirBnB dataset with the BBGA. Can you figure out if all apartments of AirBnB are designated as housing? Which number of apartments are not rented out all the time but are also used as normal housing?

#usable data from bbga

bbga_df = pd.read_csv('./data/2023_BBGA_0614_cf8d825e89.csv') #./bbga_short.csv
#print(bbga_df)
bbga_df.replace({ "Oud West, De Baarsjes" :"De Baarsjes - Oud-West",
            "De Pijp, Rivierenbuurt":"De Pijp - Rivierenbuurt",
            "Indische Buurt, Oostelijk Havengebied" : "Oostelijk Havengebied - Indische Buurt",
            "IJburg, Zeeburgereiland" : "IJburg - Zeeburgereiland",
            "Geuzenveld, Slotermeer" : "Geuzenveld - Slotermeer",
            "Buitenveldert, Zuidas" : "Buitenveldert - Zuidas",
            "Oud-Zuid" : "Zuid" ,
            "De Aker, Sloten, Nieuw-Sloten" : "De Aker - Nieuw Sloten",
             "Gaasperdam" : "Gaasperdam - Driemond"}, inplace = True)


print(bbga_df)
bbga_df = pd.read_csv('./data/2023_BBGA_0614_cf8d825e89.csv')
bbga_ok = bbga_df.loc[3937:3963,['gebiednaam','BHVESTAIRBNB']]
#name of neighborhood paramenters equally (using airbnb file)
new_bbga = bbga_ok.replace({
            "Oud West, De Baarsjes" :"De Baarsjes - Oud-West", "De Pijp, Rivierenbuurt":"De Pijp - Rivierenbuurt",
            "Indische Buurt, Oostelijk Havengebied" : "Oostelijk Havengebied - Indische Buurt", "IJburg, Zeeburgereiland" : "IJburg - Zeeburgereiland",
            "Geuzenveld, Slotermeer" : "Geuzenveld - Slotermeer", "Buitenveldert, Zuidas" : "Buitenveldert - Zuidas",
            "Oud-Zuid" : "Zuid", "De Aker, Sloten, Nieuw-Sloten" : "De Aker - Nieuw Sloten", "Gaasperdam" : "Gaasperdam - Driemond" 
            })

#plot the amount of apartments from both datasets
fig = px.bar(re_index, x='neighbourhood_cleansed', y='count', title= 'Data comparison of airbnb units according to bbga and airbnb')
fig.add_bar(x=new_bbga['gebiednaam'], y=new_bbga['BHVESTAIRBNB'],name="BBGA")
#fig.add_bar(x=re_index.neighbourhood_cleansed.value_counts([]), y='count', name="abb")
fig.update_layout(barmode='group')
fig.show()
Unable to display output for mime type(s): application/vnd.plotly.v1+json

The results seen in the chart can be explained by the time period measured by both datasets. Still the proportion of rented units remains constant within both datasets with Haarlemmerbuurt as the only exception; reflecting an increased period of unavailability in the time registered by Airbnb.

5. How many hotel rooms should be built if Amsterdam wants to accommodate the same number of tourists?

#we sum individually the column 'accommodates' so we can retrieve the number of people who can stay 
no_ppl = ok_listings.accommodates.sum()

tourists = 30000

tourists_hotel = tourists-no_ppl

hotel_rooms_ams = 41840
hotel_beds_ams = 90918

avg_beds_per_room = hotel_beds_ams/hotel_rooms_ams

#print(avg_beds_per_room)

#Number of hotel rooms used by tourists who don't fit in airbnbs
no_hotel_rooms_used = tourists_hotel/avg_beds_per_room

print('Number of hotel rooms used by tourists who do not fit in airbnbs:', no_hotel_rooms_used)
Number of hotel rooms used by tourists who do not fit in airbnbs: 3034.0649816318

6. How many different licenses are issued?

#verifying license column
#print(ok_listings.columns[68])
licenses = (ok_listings.license.nunique())
license_values =(ok_listings.license.value_counts())
print(license_values)
exemptions = (ok_listings.license.value_counts()['Exempt'])

ulicenses = licenses-exemptions

print('Number of unique Airbnb licenses in Amsterdam (not counting the exempt values):',ulicenses)
license
Exempt                      600
0363 78AD 8875 790E 3C05     14
ABCD 1234 AB12 89EF A0F9      9
036341086EC3C2FF2493          7
0363 A250 F710 53C5 1273      6
                           ... 
0363 D239 E048 910F 216B      1
0363 CB7E 060D 07E0 A4DE      1
0363 45B2 6D92 AA8B 1326      1
0363 F6A5 F87A D89D 8FC1      1
0363 2F20 4F53 FB57 2D7F      1
Name: count, Length: 6972, dtype: int64
Number of unique Airbnb licenses in Amsterdam (not counting the exempt values): 6372

We have more listings than licenses and several ‘exempt’ values. This is because the listings already have another type of license since they either work as hotels or hostels. https://www.airbnb.com/help/article/860