Data > Data Processing
Data Processing
One of the biggest problems for the lodging industry of Korea is that there is not much historical data available at this time. In fact, it is true that skepticism about past data is spreading across industries, as the market environment changes faster and unexpected things happen more frequently. However, it is also true that the importance of historical data in the lodging industry is growing.
Table of Contents
1. Significance of Time-Series Data
Historical data becomes useful, only when it helps to address the uncertainty for the future. In the case of the lodging industry, its nature has not changed over time despite such a long history since it started in BC. It is to rent a place on a daily basis to those who are looking for a place to stay away from home in return for a certain amount of money. What has changed is that the size of the lodging industry is much larger now.
The lodging industry, which has maintained a nature for a long time, has accumulated vast amounts of data from vast samples, which reveal two important facts that are still valid.
  1. The volatility of cash flows has been repeated in a certain pattern. However, the period and amplitude of volatility vary by market. In other words, although the causes of such volatility are relatively clear, the influence of each varies by market. As such, historical data could provide implications for the patterns of volatility for the the future.
  2. There has been a consistent way of survival throughout dynamic changes. The secret lies in quickly absorbing the paradigm that dominated each era and applying it to the industry. The development of a structured value chain out of the manufacturing industry’s mass production paradigm is an example.
2. Lodging Industry Statistics in Korea
The lodging industry of Korea today is based on Western-style 'hotels' introduced during Japanese occupation. It was in the 1960s when we started to build hotels by ourselves, and in the 1970s when it started to take shape as an 'industry' along with privatization. In other words, it might have been difficult to accumulate statistical data, and in fact, the database of the Korean lodging industry is still thin compared to the size of the industry.
It was in the 1990s that the basis for lodging industry statistics began to be established in Korea. The "Domestic Travelers Survey" and the "International Travelers Survey" have begun to be published on an annual basis since 1992 and 1993 respectively. The "Hotel Operating Statistics" began to be compiled on an annual basis since 1997, and the "Economic Census" began to be published every five years since 2011, which are the modern lodging industry statistics.
For hotels, supply, demand, and revenue data have been accumulated on an annual basis since 1997, which are widely used in practice. However, hotels account for less than 4% of the overall supply of the lodging market in Korea in terms of the number of establishments, and 15% in terms of the number of rooms. In other words, hotel data, which is in relatively good shape, does not represent the trend of the overall lodging market.
3. Principles in Data Processing
Lobin aims to gain comprehensive visibility throughout the lodging market. This is to make it possible to more accurately judge the dynamic competitive environment and to derive effective strategies. There are three main dimensions here.
  1. Build a database that encompasses not only hotels but also the entire lodging industry. The boundaries between types of lodging establishments are diminishing, and various types such as hotels, motels, and pensions are competing with each other for the same demand, while visibility of the circumstance is extremely limited.
  2. Build a database that encompasses the entire value chain, including supply, demand, revenues, and expenses. Currently, supply and demand statistics cannot be used together due to different aggregate standards, and data such as revenues and expenses do not exist in most cases. On the other hand, it is necessary to measure the impact of changes as well as to identify the causes of changes in the competitive environment.
  3. Build a database that encompasses not only recent trends but also long-term trends. The lodging industry has been dealing with standardized products for a long time, and the impact of demand volatility has demonstrated a certain pattern. In other words, history is still the surest starting point in resolving current and future uncertainties.
Lobin's comprehensive database contains both actual data and algorithmic estimations. They are selected and utilized based on the following principles.
  1. Data is processed at the lodging establishment level. It is to capture potential differences in volatility patterns between establishments even when they are in the same region with the same type. Therefore, if there is available establishment level data, it is used first, and if there is not, statistical data aggregated by region, type, demand group, etc. are distributed over establishments based on the competitive index.
  2. The actual data is prioritized over the estimations, if available. However, the actual data samples at the establishment level are limited, and the frequency of errors is also high when it comes to survey-based statistics. In the case of statistical errors, they are identified in comparison to the data of similar establishments according to the competitive index, and corrected if determined as errors. If there is no actual data at the establishment level, it is estimated based on the actual data of similar establishments according to the competitive index.
4. Procedures for Data Processing
Specifically, after the collected original data is processed and used according to the following principles.
  1. Supply Data: For existing establishments, “Lodging Business Ledger”, “Rural Minbak Ledger” and “Urban Minbak Ledger” from MOIS, and “Tourist Lodging Business Ledger” from MCST are collected, corrected for errors and processed at the establishment level. For establishments in the pipeline, “Building Permit” data from MOLIT are collected and filtered with the property type of ‘lodging’, while the planned new supplies for minbaks are not collected as the property type for them fall under ‘residential’ than ‘commercial’. Based on the list of establishment collected and processed in this way, a supply database is established by matching ‘building ledger’, ‘property deed’ and ‘land assessment’ data for each establishment.
  2. Demand Data: “Domestic Travelers Survey” from MCST for domestic demand, “International Travelers Survey” from MCST and “Immigration Statistics” from MOJ for international demand, and “Hotel Operating Statistics” from KHA for hotel demand are collected and distributed over establishments by year, region and type. However, since there exists a discrepancy in compiling standards between supply and demand data, the supply and demand statistics are processed to match according to Lobin’s lodging establishment classification system. Then, the processed demand data are distributed over establishments and establish the demand database.
  3. Financial Data: “Hotel Operating Statistics” from KHA for hotels, and “Service Sector Census” and “Economic Census” from KOSIS for the others are collected and corrected as they are survey-based statistics. In addition, in the case of “Service Sector Census” and “Economic Census”, data exist every five years, so data over the period in between are estimated through self-development algorithms. The cyclicality functions of the lodging market by region and type are extracted first based on demand data and “Hotel Operating Statistics” data, and distributed over establishments according to the competitive index.
5. Self-Developed Algorithms
Various algorithms are used in processing data, which are largely divided into collection algorithm, correction algorithm and estimation algorithm, each with the following characteristics.
  1. Collection Algorithm: It is an improvement of the existing automation algorithm to suit large-capacity tasks. Data of about 90,000 establishments are collected from “Lodging Business Ledger”, “Rural Minbak Ledger” and “Urban Minbak Ledger” from MOIS, including suspended and closed establishments. They are automatically updated every three months for free data such as ‘building ledger’, ‘building permit’ and ‘land assessment’, and 12 months for paid data such as ‘property deed’.
  2. Correction Algorithm: Although it is a relatively simple algorithm, different operations are applied depending on the type of errors. For address errors, the range to search ‘building ledger’ is expanded and replace the listed address with one matching the establishment details. At this time, the block address is used instead of the street address, as there are cases where street addresses have not been assigned yet, and the building name of the street address is often different from the business name. For financial data errors, the most common errors are missing amounts and incorrect units. In the case of missing amount, it is reversely calculated from related values such as revenues, and in the case of incorrect units, it is converted into the minimum sellable unit and corrected by validating it with similar samples according to the competitive index.
  3. Estimation Algorithm: It is an improvement of the existing AI algorithm to suit the characteristics of the lodging industry. Most of AI models widely used for financial prediction are based on regression analysis, especially the linear regression. However, the linear regression models have limitations in predicting cash flows of lodging markets where irregular cyclicalities exist. Above all, it is difficult to secure practical utility due to the backwardness of the prediction. Therefore, we have developed our own algorithms based on ‘Fourier Series’ and ‘Multiple Regression’ that can capture the cyclicality more accurately.
Scroll to Top

Data Source

  • GDP: GDP, Current $US (World Bank Open Data)
  • Establishments: Compendium of Tourism Statistics (UNWTO), Lodging Business Ledger (MOIS)
  • Rooms: Compendium of Tourism Statistics (UNWTO), Lodging Business Ledger (MOIS)
  • Lodging GDP: Value Added by Industry (BEA), National Accounts (Cabinet Office), GDP of Indonesia (BPS), GDP & GNI by Sector (BOK), Economic Census (KOSIS)
  • Period: 2017-2021

※ In Korea, general & residential accommodations are included while rural & urban minbaks are excluded. Comparable countries are selected based upon availability of lodging GDP statistics for all types of accommodations.

Data Source

  • Population: Population, Total (World Bank Open Data)
  • GDP: GDP, Current $US (World Bank Open Data)
  • Rooms: Compendium of Tourism Statistics (UNWTO), Lodging Business Ledger (MOIS)
  • Lodging GDP: Value Added by Industry (BEA), National Accounts (Cabinet Office), GDP of Indonesia (BPS), GDP & GNI by Sector (BOK), Economic Census (KOSIS)
  • Period: 2017-2021

※ In Korea, general & residential accommodations are included while rural & urban minbaks are excluded. Comparable countries are selected based upon availability of lodging GDP statistics for all types of accommodations.

Coming soon! We're still in process.

1:1 Contact
Please enable JavaScript in your browser to complete this form.

※ Messages intended for defamation or obstruction of business by Lobin Co. and /or other third parties, or containing content that is obscene, violent or unrelated to Lobin Co.’s business, will automatically be blocked by the system and not delivered to the administrator.

Data Source

  • Korea: Lodging Business Ledger (MOIS), Tourist Accommodation Ledger (MCST)
  • USA: Census Database (STR)
  • Period: As at the end of 2021

※ General & residential accommodations other than rural and urban minbaks asre included for Korea. Life cycle was calculated as of December 31, 2021 or actual closure date. If there exists a discrepancy between data sources for an establishment, the discrepancy was settled through an algorithm before use.

Data Source

  • Korea: Lodging Business Ledger (MOIS), Tourist Accommodation Ledger (MCST), Economic Census (KOSIS), Hotel Operating Statistics (KHA), DART (FSS), Trends Report (STR)
  • USA: Compendium of Tourism Statistics (UNWTO), Census Database (STR), Trends Report (STR)
  • Period: 2005-2021

※ General & residential accommodations other than rural and urban minbaks asre included for Korea. Visibility was calculated as the number of establishments for which revenue data is available divided by the total number of establishments. If there exists a discrepancy between data sources for an establishment, the discrepancy was settled through an algorithm before use.

Data Source

  • Guests(Korea): Domestic Traveler Survey (MCST), International Traveler Survey (MCST), Hotel Operating Statistics (KHA)
  • Rooms(Korea): Lodging Business Ledger (MOIS), Tourist Accommodation Ledger (MCST)
  • Guests(USA): Compendium of Tourism Statistics (UNWTO), Trends Report (STR)
  • Rooms(USA): Compendium of Tourism Statistics (UNWTO), Census Database (STR)
  • Period: 2005-2020

※ General & residential accommodations other than rural and urban minbaks asre included for Korea. If there exists a discrepancy between data sources for an establishment, the discrepancy was settled through an algorithm before use.