Data > Data Processing
Data Processing
1. Value of Data
As AI advances, market changes are accelerating, and the boundaries of competitive environments are becoming increasingly blurred. The industrial paradigm is shifting from economies of scale to economies of speed.
Data serves as the starting point for quickly detecting changes in the market and competitive environment and responding effectively. It must now also provide a comprehensive and accurate view over the market as a whole.

2. Collecting Data
① Census Data vs Sample Data
Statistical data related to the lodging industry can be categorized as census data or sample data, depending on the coverage. Data collected from the entire population is census data, while data collected from a subset is sample data.
In terms of offering a comprehensive view of the entire market and competitive landscape, census data is clearly more valuable. However, because it requires significant time and cost, most available statistics are limited to sample data.
For sample data to be practically useful, the sample must accurately reflect the characteristics of the population. In Korea’s lodging industry, however, regulatory oversight and data coverage differ by lodging establishment type and governing body, making it difficult to ensure consistency.
② Survey Data vs Record Data
Statistical data can also be classified by collection method: record data and survey data. Record data is obtained by reviewing documented records of each unit, whereas survey data is based on respondents' answers.
From the standpoint of accuracy and market-wide insight, record data is significantly more reliable. However, only some supply-related administrative data is available in record form, and most usable statistics are still based on survey data.
To be practically useful, survey data must be accurate—but errors caused by misunderstanding or intentional misreporting are difficult to avoid, and there are limits to how much such errors can be detected or corrected.
At Lobin Co., the principle is to use census record data for supply and sample survey data for demand and financials. This reflects the scope of available statistics: supply data is collected at the lodging establishment level, while demand and financial data are collected at the metro market level.
Field
Source
Supply
Property
Building Ledger, Building Permits Ledger, Property Records
Enterprise
Business Registration
Establishment
Lodging Business Ledger, Tourist Lodging Business Ledger, Rural Minbak Ledger, Urban Minbak Ledger
Demand
Domestic
Domestic Travelers Survey (2005-)*, Hotel Operating Performance (2005-)*
International
International Travelers Survey (2005-)*, Hotel Operating Performance (2005-)*
Financial
Revenue
Economy Census (MDIS, 2010, 2015)*, Service Industry Survey (MDIS, 2005)*, Hotel Operating Performance (2005-)*
Profitability
Economy Census (MDIS, 2010, 2015)*, Service Industry Survey (MDIS, 2005)*, Financial Statement Anlysis (2005-)
Others
Service Industry Survey (MDIS, 2005)*, Financial Statement Analysis (2005-), Tourism Business Financial Statistics (2005-2009)
* Used the compete enumeration of the survey data, not the summary statistics in the report.
3. Processing Data
Collected data must go through a transformation process to become suitable for extracting population-level insights. There are two main approaches to data processing: inductive and deductive, each with distinct characteristics.
① Inductive Processing
Inductive processing involves aggregating individual data values to extract features that represent the population. Since missing items are excluded, the process is simpler and faster. However, if the sample size is small, the results may not accurately reflect the population.
In Korea’s lodging statistics, due to wide variation in coverage and methods across lodging establishment types, inductive processing often excludes many items, making it difficult to fully capture the characteristics of the entire population.
For instance, in datasets such as credit card, mobile, or POS data, foreign guest information is often missing. In reservation channel data, lodging establishment type bias may occur depending on the platform. These issues limit the ability to provide a full view of the market and competitive landscape.
② Deductive Processing
Deductive processing infers population-level totals based on individual data points, enabling the extraction of broader insights. Since it involves estimating missing items, the process is more complex and requires advanced verification. However, if inference performance is reliable, the results can be highly representative of the full population.
Until recently, the high computational cost and algorithmic complexity of deductive processing limited its use. But with the rise of AI, those technical barriers have largely been overcome. Still, the reliability of deductive processing depends heavily on the performance of the inference algorithm.
Such performance improves when the collected data has a large sample size, diverse variables, and long time spans. However, when data is limited, the best way to compensate is by applying a theoretical framework based on experience and knowledge.
At Lobin Co., the principle is to use inductive processing for supply data and deductive processing for demand and financial data. Inference—through correction and estimation—is performed by applying a proprietary theoretical framework built on global hotel industry expertise to external large language models (LLMs).
Field
Details
Correction
Subject
Demand & financial data with different values for the same item
Method
1) Identify independent variables and derive relevant fonctions
2) Independent variable error: replace with the value in confirmed statistics
3) Relevant function error: correct through history & benchmarking analyses*
2) Independent variable error: replace with the value in confirmed statistics
3) Relevant function error: correct through history & benchmarking analyses*
Standard
Correct sellable unit values by establishment and reflect sales volume
Validation
Compare with the sum in the confirmed statistics (same sample, 95% confidence level)
Estimation
Subject
Detailed items of demand & financial data with values missed
Method
1) Identify independent variables for the data item
2) Derive functions for cyclicality and seasonality by region and type
3) Estimate missing value through history & benchmarking analyses*
2) Derive functions for cyclicality and seasonality by region and type
3) Estimate missing value through history & benchmarking analyses*
Standard
Correct sellable unit values by establishment and reflect sales volume
Validation
Compare with the sum in the confirmed statistics (same sample, 95% confidence level)
* The history analysis refers to a comparative analysis against previous indicators of the establishment itself, and the benchmarking analysis refers to a comparative analysis against recent indicators of competitive establishments.