Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Data Descriptor
  • Open access
  • Published: 06 April 2024

From home energy management systems to energy communities: methods and data

  • Antonio Ruano   ORCID: orcid.org/0000-0002-6308-8666 1 &
  • Maria da Graça Ruano 2  

Scientific Data volume  11 , Article number:  346 ( 2024 ) Cite this article

Metrics details

  • Energy grids and networks
  • Photovoltaics

This paper introduces the HEMStoEC database, which contains data recorded in the course of two research projects, NILMforIHEM , and HEMS2IEA , for more than three years. To be manageable, the dataset is divided in months, from January 2020 until February 2023. It consists in: (a) consumption electric data for four houses in a neighbourhood situated in the south of Portugal, (b) weather data for that location, (c) photovoltaic and battery data, (d) inside climate data, and (e) operation of several electric devices in one of the four houses. Raw data, sampled at 1 sec and 1 minute are available from the different sensing devices, as well as synchronous data, with a common sampling interval of 5 minutes are available. Gaps existing within the data, as well as periods where interpolation was used, are available for each month of data.

Background & Summary

Over the last two decades the global electricity consumption market has been growing at an average yearly reported level of 3.1%. One of the largest consumer sector are buildings, and in particular the residential sector. Managing efficiently the flow of electricity in a house is important, not only from the point of view of the owner’s electricity bill, but also from the point of view of global consumption, as well as from the point of view of the electrical grids. In fact, traditional grids find it difficult to cope with this increasing demand, exacerbated by the integration of extensive variable energy resources, such as renewable energy systems.

The present dataset is the result of two projects NILMforIHEM , and HEMS2IEA . The aims of the first project were to improve the performance of existing non-intrusive load monitoring algorithms and the efficiency of energy systems in homes. The second project, using the results of the former, aimed to propose new energy management techniques for local energy communities, managed by an aggregator. It was considered that the aggregator would interface with each residential management system and with the electricity grid, allowing electricity to be managed in accordance with different community contracts. The dataset enables several different topics related to the efficient use of energy in households and communities to be investigated by the research community. In the sequel a brief review of these topics is conducted.

Home energy management systems

The goal of a Home Energy Management System (HEMS) is to manage efficiently the flow of electricity in the house, so that the electric bill is reduced or annulated, maintaining the comfort of its occupants. Despite the large interest of the research community, due to the complexity and diversity of the systems, as well as by the use of suboptimal control strategies, energy consumption is still higher than necessary, and users are unable to yield full comfort in their homes. Excellent reviews detailing HEMS developments in recent years are available; please consult the reviews of Beuadin and Zareipour 1 , Leitão and co-workers 2 , Mahapatra Mahapatra and Nayyar 3 or Gomes et al . 4 . According to this last reference, HEMS can be broadly divided into four classes: traditional techniques, model predictive control, also known as model-based predictive control (MBPC), heuristics and metaheuristics, and other techniques. The first class comprises methods based on traditional optimization techniques, typically using commercial solvers. Perhaps the most important sub-class within traditional methods is the use of Mixed-Integer Linear Programming (MILP), which refers to optimization techniques where the objective function is a linear function and subject to linear restrictions, but includes mixed, continuous and discrete variables. Examples of household energy management based on MILP are the works of:

Lu et al . 5 , where the results of the proposed HEMS are compared with other energy management systems, showing the effectiveness of the proposed model, through case studies that allow reducing energy costs in both summer and winter;

Baek et al . 6 , where results are compared when demand response is employed and when it is not. They demonstrate that the strategy presented with demand response is superior;

Lyu et al . 7 , where the proposed methodology allows to reduce house costs by 53% and reduce Peak-to-Average Ratio (PAR) by around 70%.

Model-based predictive control is an advanced control technique based on a receding horizon principle, aimed at determining the best sequence of actions while meeting the requirements. The application of MBPC in HEMS has increased significantly in recent years. For instance, in Mirakhorli et al . 8 a HEMS for a residential building with a Photovoltaic (PV) system, Electric Storage System (ESS), thermal and electric loads, and Electric Vehicles (EV) is proposed. The MBPC problem considered a prediction horizon of four hours for every five minutes. Rao and co-workers 9 propose a HEMS for a smart home focusing on the energy balance between the three phases to control both active and reactive power. Several case studies are considered, assuming a prediction horizon of twenty-four hours, a control horizon of twenty-four hours, and a simulation horizon of forty-eighty hours. A comprehensive approach of a mixed-integer quadratic-programming MPC scheme based on the thermal building model and the building energy management system is employed by Killian and co-workers 10 .

Heating, ventilation and air conditioning systems

It is recognized that near 40% of the energy (see Pérez-Lombard and co-workers 11 ) consumed in buildings is due to the operation of Heating, Ventilation and Air Conditioners (HVAC). For this reason a special care should be devoted to this specific equipment. MBPC is perhaps the most proposed technique for HVAC control since it offers an enormous potential for energy savings. Typically what is sought is the minimization of the energy spent, or the electricity bill, incurred in the HVAC operation, while simultaneously maintaining the room(s) under thermal comfort. Thermal comfort can be assessed in different ways, the most used being temperature regulation. In some cases, the relative humidity is also maintained within user-defined bounds. In the last years, the Predicted Mean Vote (PMV) is increasingly used. The PMV index is based on human thermal sensation which is strongly related with the energy balance of the body when the human body is considered in a heat balance situation, i.e., the heat produced by metabolism equals the net loss of heat. The classical way in which the PMV index can be obtained was presented by Fanger 12 and is dependent on six variables: metabolic rate, clothing insulation air temperature, relative humidity and velocity, and mean radiant temperature.

For HVAC control, MBPC can be applied in several different ways. Donaisky and co-workers 13 minimized the PMV index, generating a nonlinear PMV model having a Wiener structure. Ma et al . 14 employ a simple thermal mass model to minimize a cost function employing economic costs. Castilla et al . 15 minimize the PMV index, using a PMMPC model. In Chen’s work 16 the energy is (indirectly) minimized, using constraints on the thermal sensation scale, where the use of the PMV index is compared with an Actual Mean Vote index. A simple thermal model is used in this approach. In Huang et al . 17 a neural network is used to optimize a start-stop strategy for temperature-regulated control. Li et al . 18 minimize the energy spent and violations of bounds on air temperature, using a state-space formulation for the prediction of these variables.

Non-intrusive load monitoring

Energy monitoring is a key point of a HEMS; it can be done installing measuring devices at every load of interest or using Non-Intrusive Load Monitoring (NILM) methods, which disaggregate the overall usage, using a measure of the load at the utility service entry. Research, however, is still needed in this field, specially in terms of simple algorithms, without requiring either special-purpose hardware or the use of high-sampling power data.

Excellent reviews on NILM algorithms can be found in the works of Georgios Angelis et al . 19 and Ruano and co-workers 20 , 21 .

The main stages in a NILM application are 21 :

Data collection: electrical data, including current, voltage, and power data, are obtained from smart meters, acquisition boards or by using specific hardware;

Event detection: an event is any change in the steady state of an appliance over time. An event implies variations in power and current, which can be detected in the electrical data previously collected by means of thresholds;

Feature extraction: appliances provide load signature information or features that can be used to distinguish one appliance from another;

Load identification: using the features previously identified, a classification procedure takes place to determine which appliances are operating at a specified time or period, and/or their states.

Regarding step (a), the most important point to consider is the sampling interval applied to the electrical signals. They can broadly be classified into very low (slower than one minute), low (between than one minute than one second), medium (sampling frequency between one and fifty/sixty Hz), high (from fifty/sixty Hz to two kHz), very high (between two and forty kHz) and extremely high (greater than forty kHz). Another point to take in consideration is the hardware used to acquire the data. Commercial devices typically only achieve very low and low frequencies; higher sampling frequencies need specialized hardware. Related with that are data storage and processing capabilities, which obviously increase with the sampling frequency employed.

Focusing now at step (b), according to the work of Anderson et al . 22 event detectors typically use three different approaches: expert heuristics, probabilistic models and matched filters. The former consist of the creation of a set of rules for each appliance. Initial NILM works used this approach. Probabilistic models provide a probability, used to make a decision about the occurrence of events. A particularly well-known case is the Generalized Likelihood Ratio (GLR) method (please see Anderson’s work 22 ). Finally, matched filters are characterized by extracting the signal waveforms and correlating them with known patterns.

The features that can be used to identify an appliance are obviously related to the sampling time employed. For very low and low frequencies, active, apparent and reactive powers are often used, together with Root-Mean-Square values of the current or voltage. Medium rate acquisition allows the use of transient features of the electrical features. High sampling rates allow to employ spectral features such as harmonics (see Meehan et al . work 23 ), Discrete-Wavelet Transform (Chang and co-workers 24 ), and so on. Very high rate data allows to obtain much more detail about each appliance’s waveform, either from the higher harmonics or from the shape of the raw current and voltage waveforms themselves. Two-dimensional voltage-current (V-I) trajectories were used in Hassan and co-workers investigation 25 .

Using the features described above, computed from the aggregate load, the objective in step d) is to identify the appliances that are operating at a given time. This can be formulated as a optimization or classification problem, as four appliance types are usually considered:

Type I—On/off devices: most appliances in households, such as bulbs and toasters;

Type II—Finite-State-Machines (FSM): the appliances in this category present states, typically in a periodical fashion. Examples are washer/dryers, refrigerators, and so on;

Type III—Continuously Varying Devices: the power of these appliances varies over time, but not in a periodic fashion. Examples are dimmers and tools.

Type IV—Permanent Consumer Devices: these are devices with constant power but that operate 24 h, such as alarms and external power supplies.

This way, for the case of type II appliances, identification is not only translated into which appliances are active, but also their states.

A very large number of techniques have been proposed for this step. They can be very broadly classified as optimization methods and machine learning (supervised and unsupervised) techniques. Optimization approaches use different methods to perform a combinatorial search. Examples are hybrid programming (Kong et al . work 26 ), genetic algorithms (Egarter, Sobe & Elmenreich paper 27 ) and others. Supervised techniques use offline training to achieve a database of information used to design the classifier(s). These are the most employed class of methods; the works of Chang et al . 24 , Kelly & Knottenbelt 28 and Wu and Wang 29 belong to this class. Unsupervised methods do not require any training prior to classification, which is an important advantage. Feature clustering, and the later labelling of each cluster with meaningful appliance names has been applied by Yang and co-workers 30 . The most recent unsupervised techniques applied to NILM belong to a family of methods that assume that the electrical signal is the output of a stochastic system, maintaining a representation of the whole system state, instead of dealing with individual events. Examples are Hidden Markov Methods (HMM) and variants (please see the works of Cutsem et al . 31 and Kong et al . 32 ).

Forecasting

Another important point for HEMS is the ability to forecast the values of important variables for energy management. And several forecasts are necessary, such as the home load demand, either global or appliance-based, the electricity produced by renewable energy sources, if available, weather variables, occupancy, inside climate, for instance. The better the quality of the estimation, the better the electricity management that can be achieved.

Forecasting techniques can be envisaged from several points of view, such as: (a) the time-scales involved; (b) the exogeneous variables used in the model; and (c) the methods applied. Regarding the former, time-scales can vary from horizons of a few seconds or minutes (intra-hour or very short forecasts, for control and adjustment actions), a few hours (intra-day or short/medium, for energy resource planning and scheduling, as well as for the electricity market), to a few days ahead (intra-week or long, for unit commitment and maintenance schedules). The choice of employing exogeneous variables, and in the affirmative case, which variables are used depends essentially on the model application. Finally, looking at the methods, in the general case they can be broadly divided into statistical and machine learning methods (obviously forecasting of specific variables may employ other class of methods). Statistical models are typically linear models such as persistent forecasts, Auto-Regressive (AR), Auto-Regressive–Moving-Average (ARMA), and Auto-Regressive Integrated Moving Average—ARIMA. Machine Learning methods are the most used nowadays and typically comprise several different shallow and deep neural networks, whether isolated or fusing different models.

Regarding PV power forecasting, several reviews exist in the topic. The interested reader can consult, for instance, the works of Alcañiz et al . 33 or Pandžić and Capuder paper 34 , and the references within. Forecasting PV power will also need the forecasting of atmospheric variables, such as solar irradiation (please see El-Amarty et al . work 35 ), air temperature (Tran et al . 36 ), and possibly others. As examples, Yang and co-workers 37 proposed a hybrid scheme, involving classification, training, and forecasting stages. This scheme is used for one-day ahead hourly forecasting of PV output. Fonseca and co-workers 38 compare the suitability of a non-parametric distribution and three parametric distributions in characterizing prediction intervals for photovoltaic energy forecasts with high levels of confidence. Mei et al . 39 propose an LSTM-Quantile Regression Averaging-based nonparametric probabilistic forecasting model for PV output power.

Households load demand forecasting is an active area of research as, on one hand, it allows the occupants to be aware of the energy consumption of their own house and, consequently, to take measures to reduce this consumption and the energy bill, and, on the other hand to enable a more efficient operation of the HEMS. During the last years, computational intelligence techniques somehow replaced physical-based methods, as the former do not require knowledge of the building geometry and physical phenomena to deduce an accurate prediction model. Several reviews exist on this topic, such as Foucquier’s 40 , Wei et al . 41 , Ahmad et al . 42 and Wen et al . 43 . As in the case of PV forecasting, different exogenous variables can be applied to the prediction models, such as atmospheric air temperature, number of occupants, codifications of days between, week, weekend, and holidays, to name but a few. Different computational methods can also be applied. For instance, Mynhoff et al . 44 compared different prediction models: Artificial Neural Networks-Nonlinear Auto-Regressive (ANN-NAR), HMMs, Support Vector Machines (SVM), MultiLayer Perceptrons (MLP) and Deep Belief Networks (DBN) for one-step daily and weekly forecasts. Yildiz and co-workers 45 compared the forecasting performance of ANNs, SVMs and Least-Squares SVMs, with different data resolutions and forecasting horizons, with several models, each applied to a different load profile, obtained by clustering the load profiles.

Forecasts can also be applied to energy markets. In recent years, in many countries, the acquisition and sale of electricity is traded in energy markets (please see Yildiz and co-workers 46 ). Accurate forecasts of the electricity demand and price are therefore a need for the participants in the energy markets. In particular, the one-day ahead hourly forecast, considered a short-term forecast, has received increasing attention from the research community. Comprehensive reviews on load and price forecasting are available in Suganthi & Samuel 47 and Weron’s work 48 respectively.

Finally, according to Zhang, He & Yang 49 , existing load and generation forecasting algorithms can be classified into two classes: point forecasts and probabilistic forecasts. The former provides single estimates for the future values of the corresponding variable, which are not capable of properly quantifying the uncertainty attached to the variable under consideration. The latter algorithms are increasingly attracting the attention of the research community due to their enhanced capacity to capture future uncertainty, describing it in three ways: prediction intervals, quantiles, and probability density functions (PDF) (please see Bracale and co-workers 50 ).

Communities of energy

Obviously, better and more efficient solutions, not only from each householder’ point of view, but also from the community consumption perspective, are extensions of the tools above described to groups of households that share between them the energy produced or stored, in the form of communities of energy. In this context the local HEMS can be hierarchically controlled by an aggregator, which supervises not only the management of energy in each local prosumer (productor/consumer), but also the flow of energy between the members of the community as a whole, as well as the exchanges between the community and the grid.

It is within this context that this dataset is introduced. It spans more than three years of data, covering different types of variables of high importance to the field of electrical energy and thermal comfort of, either isolated or community-based households. More specifically, it allows, for a single prosumer, to:

Test and validate different control strategies for home energy management systems, as done by us in 51 , 52 . The first reference compares MBPC control implemented with the Branch-and-Bound technique for HVAC control with the house proprietary system. The second reference employs a MILP method in a MBPC framework, controlling not only the inverter, but also appropriately scheduling loads. Both approaches achieve important savings in the electricity bill.

Design forecasting energy consumption models, as discussed in 53 , 54 , 55 . The first reference employs a design Multi-Objective-Genetic-Algorithm (MOGA) 56 framework available in our lab, which performs feature selection, topology determination and parameter estimation, to forecast load demand forty-eight-steps-ahead, with a time-step of fifteen minutes. The second one extends the previous approach to an ensemble of MOGA designed models. The third one proposes an hybrid forecasting mechanism to use with 52 .

Design forecasting PV energy generation models 57 . The approach described above is applied to PV power generation, with great success.

Moving from deterministic forecasting to probability forecasting, for both load demand and PV power generation 58

Test and validate different non-invasive load monitoring (NILM) algorithms, as performed in 59 , 60 . The first reference employs ApproxHull 61 , a data selection tool existing in our lab to deep learning models. The second one uses ApproxHull and MOGA to design shallow models to detect appliance operation and energy estimation,

Design forecasting thermal comfort models, as well as test and validate control strategies for Heating, Ventilation and Air Conditioning (HVAC) systems, as in 62 . Very basically, HVAC is controlled so that it guarantees PMV thermal comfort within user-predefined schedules, while minimizing the energy consumed, making use of forecasting models of solar radiation, atmospheric air temperature and relative humidity, inside air temperature, relative humidity and mean radiant temperature, as well as room occupancy.

Additionally, for a community of four houses, it allows to:

Test and validate different control strategies for the community energy management system, which can be found in 63 , where the MILP-MBPC strategy described above is extended for a community of houses. Different ways to share the produced and stored energy are compared.

Design day-ahead net load point and probabilistic forecasting to work with energy markets, in 64 ;

Test and validate transfer learning strategies for NILM, as discussed in D’Incecco’s work 65 .

All the above topics are important, on their own, for future research. What perhaps is most important and should be stressed is that significant improvements on the general field of energy efficiency in buildings and energy communities require the join research of all these topics, to which others can obviously be added. This is an added-value of this dataset in comparison with existing ones, as this includes all the data needed to address all the topics considered, which is not verified in existing datasets.

As the households that were employed in this research are typical Mediterranean detached family houses, the data available in this dataset can be used as representative of that segment of buildings, and climate. By this we mean that it is expectable that methods and techniques applicable to the nine classes of problems identified above, using this dataset, will produce similar results to other households or communities in regions with a similar climatic type.

As both raw data, typically sampled at one second or at one minute (please see below) and curated data, synchronized with a five minutes sampling are available, different sampling intervals can be used for the different methods. The dataset can be found at 66 .

Data was collected from four residential houses, situated in Gambelas, Faro, in the south of Portugal. All four are detached houses, with two floors and garden, where families live. Two of the houses have triphasic meters, while the others are monophasic. The former will be denoted as TH1 and TH2, while the latter are coined MH1 and MH2. TH1 has a PV system and a energy storage, MH1 has a photovoltaic system, and the others do not have any renewable energy source.

TH1 was used in NILMforIHEM project, that started in 2019. For this reason, and because it was used for objectives a) to f) above, has much more data for a much larger period of time. This house and the three additional houses were employed for project HEMS2IEA , which started in 2021. Only electric consumption data was recorded for these three houses. Recorded data for the four houses spans from November 2021 until July 2022. After this date, as one of the houses had major works, data was reduced to three houses.

TH1 has twenty different spaces (including garden, halls, and so on). The floor plans are shown in Fig.  1 .

figure 1

Floor plans of TH1. Top: 1 st floor; bottom: ground floor.

A photovoltaic system was installed, composed of 20 Sharp NU-AK panels 67 , each panel with a maximum power of 300 W. (please see Figs. 2 and 3 ) The inverter is a Kostal Plenticore Plus (Fig. 4 ) converter (KI) 68 , which also controls a BYD Battery Box (Fig. 5 ) HV H11.5 (with a storage capacity of 11.5 kWh) 69 .

figure 2

Photovoltaic panels.

figure 3

Battery Box.

The house electric panel consists of sixteen monophasic circuit breakers, plus a triphasic one. Several electric variables are measured in every circuit breaker, providing approximate ground truth for the NILM identification. Circutor Wibees (WB) 70 are used as the measurement devices. They are plug and play wireless devices and use Hall Effect technology for the measurement. Because of that, calibrations are required for correct measurements. Voltage, current, frequency, active reactive and apparent power, power factor, active inductive reactive and capacitive reactive energy are measured every second for the every monophasic circuit breakers, the same number for each phase of the triphasic one, together with totalized values. In total, 198 variables are sampled by the WBs every second.

Total consumption data is supplied by a Carlo Gavazzi (EM340) three-phase energy meter 71 . This meter is a class X certificated device, and electrical measurement is done using a two-wires Modbus RTU connection. EM340 supplies 37 different electric variables, sampled at one Hz.

Measurements of the energy produced by the PV, stored in the battery and injected in the grid are obtained either from the inverter (KI) or from a Kostal smart energy meter (KEM) 72 . Home electrical consumption variables are also available in the inverter. In total, 78 variables are obtained by KEM and KI, at a sampling interval of one minute (Fig. 9 ).

figure 6

Weather Station.

For on/off control Smart Plugs Self-Powered Wireless Sensors 73 are used (Fig. 7 ). They are also used to enable sockets belonging to the same CB to be measured individually. They are read/controlled directly using an internal web service. The number of SPs changed with time, enabling the measurement of six variables every second for each plug. In a similar way to the SPs, the Air Conditioner in Room B14 in Fig.  1 can be measured and actuated.

figure 7

Smart Plug.

A Weather station (please see Mestre et al . 74 ) measures the air temperature and relative humidity, and global solar radiation, at one second intervals (Fig. 6 ).

figure 8

Self-Powered Wireless Sensor.

Self-Powered Wireless Sensors (please see Ruano et al . 75 ) are used for measuring climate room data, such as air temperature and relative humidity, status (open/close) of doors and windows, walls temperature, light and room movement (Fig. 8 ). They are Ultra-Low-Power devices and communicate via ISM radio band working on 2.4 GHz or 868 MHz frequencies.

figure 9

Schneider panel.

Data transmission from/to the measurement devices is available through Gateways and a Technical Network. A technical IP-cabled and a wireless network have been created using a network router, separating the home network from the technical network.

Finally, an IOT platform was created to interactuate with the data acquisition system. For more information on the acquisition system and the IOT platform, please see Ruano et al . 76 .

In the three additional houses, only electric consumption is measured. For this reason, in TH2, a Carlo Gavazzi EM340 meter was installed. In MH1 and MH2, Carlo Gavazzi EM112 (one-phase) meters were installed, providing a subset of variables acquired by the EM340.

Data Records

The data records are available in Zenodo 66 . The datasets are divided in months, starting in January 2020, and ending in February 2023, spanning therefore more than three years. They are Matlab data files, with the format ‘v7’, which can be loaded using the usual ‘ load’ Matlab command. Notice that the use of this format enables the data to be read directly by other languages, such as python, using the function loadmat in scipy.io .

The sensing devices are categorized in eight categories, and within each category, there might be different appliances.

The variables measured by the Wibeees are shown in Table  2 .

There are sixteen monophasic WBs and 1 triphasic. The monophasic WBs range from one to fifteen, and nineteen. The triphasic one ranges from sixteen to eighteen, corresponding to each one of three phases. The most important electric appliances in TH1 are shown in Table  3 .

The data acquisition of the wibeees is asynchronous. This means that there is a time basis for each device. The different time basis are stored in the matrix dtvec . The number of samples for each device is stored in the vector ndt . Therefore, if you want to plot the evolution of the phase factor of, let us say, wibeee 6, you should use the Matlab command:

plot(dtvec(1:ndtvec(6),6),PFvec(1:ndtvec(6),6)

There are several variables associated with the inverter/battery. These variables are sampled at a 1 minute rate. They are detailed in Tables  4 – 8 .

There are several variables associated with the EM112 and EM340 meters. These variables are sampled at a one second rate. They are detailed in Table  9 for the monophasic meters, and in Table  10 , for the triphasic ones. The variables might be vectors (if only one house is measured in the corresponding period) or matrices (if there are measurements available for the two houses).

The maximum number of Smart Plugs existent in TH1 was 4. Data was sampled at one second. The measured variables are represented in Table  11 .

The Intelligent Weather Station measures data minute by minute. The variables are shown in Table  12 .

The Self-Powered Wireless Sensors measure variables in 4 compartments of TH1: in the first floor, the Hall, Bedrooms 1_2 and 1_4, and in the ground floor, the Lounge (please see Fig.  1 ). Data is sampled at 1 minute intervals. The measured data is shown in Tables  13 – 16 .

Finally, Table  17 illustrates the variables measured by the Air Conditioner at bedroom 1_4. Data is measured at one minute intervals.

Technical Validation

Until now, we have mentioned variables named as ‘ ***vec’ . They are a raw version of the variables *** , with possibly interpolated data (please see below). The time basis for each one of the 34 devices is, as already specified, different from each other, and expressed in each dt***vec variables.

As for processing a single time basis is needed, all variables have been down-sampled to a 5 minutes sample time, where the values for each sample are the mean values of the corresponding variable, during the corresponding five minutes interval. Energy variables have been down-sampled to a one hour interval.

Consider, as an example, the month of August 2020. There, PFvec (Phase Factor of the 19 wibeees) has a size of 2,675,237*19, while the averaged version, PFveccon , has a size of 8,928*19. The common power time basis is available in the date variable dtveccon , and for energy values the common time basis is in dteneveccon .

This way, if you want to plot the evolution of the phase factor of, let us say, wibeee 6, you should use (please see Fig. 10 ):

plot(dtvec(1:ndtvec(6),6),PFvec(1:ndtvec(6),6))

while if you are happy with only the averaged values (please see Fig. 11 ), you would use:

plot(dtveccon,PFveccon(:,6)) .

With real-time measured data, there is always the possibility of having missing or invalid data. All measured data is pre-processed, to check for possible gaps. If the number of consecutive missing values is less than seven, the values are interpolated with a moving median scheme; if not they are left as 0 and the period with no data is marked.

Data are also validated. At present only the ranges of temperature, humidity and solar radiation are verified. Valid ranges are:

Smart Plugs: Current [0 inf]

WS: AT [−10 50]; RH [0 120]; RAD [0 1500]

SPWS Hall: AT [−10 50]; RH [0 120]

SPWS Bed 1_2: AT [−10 50]; RH [0 120]

SPWS Bed 1_4: AT [−10 50]; RH [0 120]; M [0 100]

SPWS L: AT [−10 50]; RH [0 120]; M [0 100]

AC: AC_RT [−10 50]; AC_IT [−10 50]; AC_OT [−10 50]

The information about interpolated data, gaps and faults can be found at the data file with the extension _stat.mat . This information can be seen in the following matrices (notice that the categories and device numbers in Table  1 are used here):

STEM , ENDEM – matrices with the number of rows equal to the number of appliances, recording the start and the end of periods without data

For instance, for the same August 2020 month, appliance 29 (the SPWS for the lounge) does not have data between 01-Aug-2021 20:52:54 and 01-Aug-2020 23:42:35, among other gaps.

STON , ENDON - start and end samples of the periods with data

For the same appliance, the first period when there are valid data is between 01-Aug-2020 00:00:36 and 01-Aug-2020 20:52:54

nEM / nON - number of periods without data/with data

For the same appliance, there are 71/72 periods without data/with data

inicio/fim - beginning/end of the data acquisition for each appliance

ttotal - total number of seconds of the specified period of analysis

Each gap can be inspected with:

gaps - array of records with all the gaps. Their structure is:

devices (category of the appliance)

num (appliance number)

k – sample index for the start of the gap

tbeg/tend - time of the start/end of the gap

tgap - total duration (in secs) of the gaps for each appliance

For appliance 29 and for the same period, for 77 hours, 57 minutes and 50 sec there were no acquired data. This device and device # 27 (SPWS B_12) have a significant percentage of missing data. This does not happen with the other variables (for August 2020 the mean of missing data for the other variables is 16,288 sec and the median 15,411 sec (that is around 0.6% of the total data).

Faults can be inspected with:

tfault - information about the total duration of the faults: array with 7 records for each device group. Each record has the following fields:

num: number of variables checked for the category

dev: array of records with the number of appliances in the group which are checked for validity: each record has:

nvars (number of variables checked)

var (variable names)

t (total faulty time for the specified variable).

faulttot – array with records for each fault. It has the following fields:

devices – category of the appliance

num – appliance number

var – variable inspected

kbeg / kend – sample numbers where the fault started/ended

tbeg / tend - time the fault started/ended

For instance, in August 2020 eight faults were recorded. The first was verified for appliance 28, belonging to category 6 (the SPWS for Bedroom 1_4). The fault was verified for the temperature, started in sample 632 and ended in sample 633, or from 03-Aug-2020 13:49:11 to 03-Aug-2020 13:50:18.

nsamplesint / nsamples - number of interpolated samples/total number of samples per appliance

For instance, for wibeee 2, the total number of samples was 2,643,997. Among them 161 were interpolated (less than 0.01%).

As explained before, Wibees needed to be calibrated, before being useful. This was done, for each Wibee, using an external instrument measuring electric power, and confronting this value with the value available through the acquisition system. This gave initial factor values, which were subsequently fine-tuned by a phase-by-phase optimization procedure, making use of the Carlo Gavazzi measured data. These multiplying factors, which are used by the Matlab file extract_quadro_10.m , are available in the Matlab data file Factor.mat . (please see below). It should be noted that this optimization procedure was executed in a monthly basis, to verify if further calibrations were needed. The factor values remained, however, constant throughout the project. .

figure 10

Phase Factor of Wibeee 6, with its own time basis (one second).

figure 11

Phase Factor of Wibeee 6, with the common time basis (five minutes).

Apart from small communication problems, there were no anomalies found for the Carlo Gavazzi meters, as well as the for KI and KEM meters. As mentioned before, they were solved by interpolation, if possible, or identified by the detection of gaps.

Code availability

All code for the generation of the dataset was written in Matlab R2022 and can be found at https://github.com/aebruano/HEMStoEC . Daily information is received by the data acquisition system in a zipped file, which should be placed in the same directory (denoted as root directory) of the function files. A sample can be found in 2023_06_11_00_00_00.zip. The README and the VARS files provide information about the format of the files enclosed in the zip file. Matlab data is extracted from the unzipped file using the Matlab function extract_quadro_10.m. The command extract_quadro_10(‘2023_06_11_00_00_00’ ) creates a Matlab data file 2023_06_11_00_00_00.mat inside the 2023_06_11_00_00_0 0 directory. Gaps are identified and data is interpolated using the function Validate_Quadro_4.m.

A data file 2023_06_11_00_00_00_cor.mat is created, again inside the 2023_06_11_00_00_0 0 directory, upon the command Validate_Quadro_4(‘2023_06_11_00_00_00’,‘2023_06_11_23_23_59’) .Data with a common time basis is achieved using the Matlab function convert_quadro_10_cor.m. Using the command convert_quadro_10_cor(‘2023_06_11_00_00_00’,‘2023_06_11_23_23_59’,”, minutes(15),hours(1)), the data file 2023_06_11_00_00_00 to 2023_06_11_23_23_59 excl pst 15   min est 1 hr_cor.mat is created, this time in the root directory. A matlab file, Factor.mat , needs to be placed in the root directory.

Beaudin, M. & Zareipour, H. Home energy management systems: A review of modelling and complexity. Renewable and Sustainable Energy Reviews 45 , 318–335, https://doi.org/10.1016/j.rser.2015.01.046 (2015).

Article   Google Scholar  

Leitão, J., Gil, P., Ribeiro, B. & Cardoso, A. A Survey on Home Energy Management. IEEE Access 8 , 5699–5722, https://doi.org/10.1109/ACCESS.2019.2963502 (2020).

Mahapatra, B. & Nayyar, A. Home energy management system (HEMS): concept, architecture, infrastructure, challenges and energy management schemes. Energy Systems 13 , 643–669, https://doi.org/10.1007/s12667-019-00364-w (2022).

Gomes, I., Bot, K., Ruano, M. D. G. & Ruano, A. Recent Techniques Used in Home Energy Management Systems: A Review. Energies 15 , 2866, https://doi.org/10.3390/en15082866 (2022).

Lu, Q., Lü, S., Leng, Y. & Zhang, Z. Optimal household energy management based on smart residential energy hub considering uncertain behaviors. Energy 195 , 117052, https://doi.org/10.1016/j.energy.2020.117052 (2020).

Baek, K., Ko, W. & Kim, J. Optimal Scheduling of Distributed Energy Resources in Residential Building under the Demand Response Commitment Contract. Energies 12 , 2810 (2019).

Lyu, J. et al . Price-sensitive home energy management method based on Pareto optimisation. International Journal of Sustainable Engineering 14 , 433–441, https://doi.org/10.1080/19397038.2020.1822948 (2021).

Mirakhorli, A. & Dong, B. Market and behavior driven predictive energy management for residential buildings. Sustainable Cities and Society 38 , 723–735, https://doi.org/10.1016/j.scs.2018.01.030 (2018).

Rao, B. V., Kupzog, F. & Kozek, M. Phase Balancing Home Energy Management System Using Model Predictive Control. Energies 11 , 3323 (2018).

Article   CAS   Google Scholar  

Killian, M., Zauner, M. & Kozek, M. Comprehensive smart home energy management system using mixed-integer quadratic-programming. Appl. Energy 222 , 662–672, https://doi.org/10.1016/j.apenergy.2018.03.179 (2018).

Article   ADS   Google Scholar  

Pérez-Lombard, L., Ortiz, J. & Pout, C. A review on buildings energy consumption information. Energy and Buildings 40 , 394–398, https://doi.org/10.1016/j.enbuild.2007.03.007 (2008).

Fanger, P. O. Thermal comfort: analysis and applications in environmental engineering . (McGraw-Hill, 1972).

Donaisky, E., Oliveira, G. H. C., Freire, R. Z. & Mendes, N. in Control Applications, 2007. CCA 2007. IEEE International Conference on . 182–187.

Ma, Y. D., Kelman, A., Daly, A. & Borrelli, F. Predictive Control for Energy Efficient Buildings with Thermal Storage. IEEE Control Syst. Mag. 32 , 44–64, https://doi.org/10.1109/mcs.2011.2172532 (2012).

Article   MathSciNet   CAS   Google Scholar  

Castilla, M., Alvarez, J. D., Normey-Rico, J. E. & Rodriguez, F. Thermal comfort control using a non-linear MPC strategy: A real case of study in a bioclimatic building. J. Process Control 24 , 703–713, https://doi.org/10.1016/j.jprocont.2013.08.009 (2014).

Chen, X., Wang, Q. & Srebric, J. Model predictive control for indoor thermal comfort and energy optimization using occupant feedback. Energy and Buildings 102 , 357–369, https://doi.org/10.1016/j.enbuild.2015.06.002 (2015).

Huang, H., Chen, L. & Hu, E. A neural network-based multi-zone modelling approach for predictive control system design in commercial buildings. Energy and Buildings 97 , 86–97, https://doi.org/10.1016/j.enbuild.2015.03.045 (2015).

Li, P. F. et al . Simulation and experimental demonstration of model predictive control in a building HVAC system. Sci. Technol. Built Environ. 21 , 721–733, https://doi.org/10.1080/23744731.2015.1061888 (2015).

Angelis, G.-F., Timplalexis, C., Krinidis, S., Ioannidis, D. & Tzovaras, D. NILM applications: Literature review of learning approaches, recent developments and challenges. Energy and Buildings 261 , 111951, https://doi.org/10.1016/j.enbuild.2022.111951 (2022).

Laouali, I. et al . in 2020 International Conference on Electronics, Control, Optimization and Computer Science (ICECOCS) . 9314383.

Ruano, A., Hernandez, A., Ureña, J., Ruano, M. & Garcia, J. NILM Techniques for Intelligent Home Energy Management and Ambient Assisted Living: A Review. Energies 12 , 2203, https://doi.org/10.3390/en12112203 (2019).

Anderson, K. D., Bergés, M. E., Ocneanu, A., Benitez, D. & Moura, J. M. F. in 38th Annual Conference on IEEE Industrial Electronics Society . 3312–3317.

Meehan, P., McArdle, C. & Daniels, S. An Efficient, Scalable Time-Frequency Method for Tracking Energy Usage of Domestic Appliances Using a Two-Step Classification Algorithm. Energies 7 , 7041 (2014).

Chang, H., Lian, K., Su, Y. & Lee, W. Power-Spectrum-Based Wavelet Transform for Nonintrusive Demand Monitoring and Load Identification. IEEE Transactions on Industry Applications 50 , 2081–2089, https://doi.org/10.1109/TIA.2013.2283318 (2014).

Hassan, T., Javed, F. & Arshad, N. An Empirical Investigation of V-I Trajectory Based Load Signatures for Non-Intrusive Load Monitoring. IEEE Transactions on Smart Grid 5 , 870–878, https://doi.org/10.1109/TSG.2013.2271282 (2014).

Kong, W., Dong, Z. Y., Hill, D. J., Luo, F. & Xu, Y. Improving Nonintrusive Load Monitoring Efficiency via a Hybrid Programing Method. IEEE Transactions on Industrial Informatics 12 , 2148–2157, https://doi.org/10.1109/TII.2016.2590359 (2016).

Egarter, D., Sobe, A. & Elmenreich, W. in Lecture Notes in Computer Science Vol. 7835 Applications of Evolutionary Computation (ed Esparcia-Alcázar, A. I.) 182–191 (Springer Berlin Heidelberg, 2013).

Kelly, J. & Knottenbelt, W. in 2nd ACM International Conference on Embedded Systems for Energy-Efficient Built Environments . 55–64.

Wu, Q. & Wang, F. Concatenate Convolutional Neural Networks for Non-Intrusive Load Monitoring across Complex Background. Energies 12 , 1572 (2019).

Yang, C. C., Soh, C. S. & Yap, V. V. A systematic approach to ON-OFF event detection and clustering analysis of non-intrusive appliance load monitoring. Frontiers in Energy 9 , 231–237, https://doi.org/10.1007/s11708-015-0358-6 (2015).

Cutsem, O. V., Lilis, G. & Kayal, M. in 2017 22nd IEEE International Conference on Emerging Technologies and Factory Automation (ETFA) . 1–8.

Kong, W. et al . A Hierarchical Hidden Markov Model Framework for Home Appliance Modeling. IEEE Transactions on Smart Grid 9 , 3079–3090, https://doi.org/10.1109/TSG.2016.2626389 (2018).

Alcañiz, A., Grzebyk, D., Ziar, H. & Isabella, O. Trends and gaps in photovoltaic power forecasting with machine learning. Energy Reports 9 , 447–471, https://doi.org/10.1016/j.egyr.2022.11.208 (2023).

Pandžić, F. & Capuder, T. Advances in Short-Term Solar Forecasting: A Review and Benchmark of Machine Learning Methods and Relevant Data Sources. Energies 17 , 97 (2024).

El-Amarty, N., Marzouq, M., El Fadili, H., Bennani, S. D. & Ruano, A. A comprehensive review of solar irradiation estimation and forecasting using artificial neural networks: data, models and trends. Environmental Science and Pollution Research https://doi.org/10.1007/s11356-022-24240-w (2022).

Tran, T. T. K., Bateni, S. M., Ki, S. J. & Vosoughifar, H. A Review of Neural Networks for Air Temperature Forecasting. Water 13 , 1294 (2021).

Yang, H., Huang, C., Huang, Y. & Pai, Y. A Weather-Based Hybrid Method for 1-Day Ahead Hourly Forecasting of PV Power Output. IEEE Transactions on Sustainable Energy 5 , 917–926, https://doi.org/10.1109/TSTE.2014.2313600 (2014).

Fonseca, J. G. D., Ohtake, H., Oozeki, T. & Ogimoto, K. Prediction Intervals for Day-Ahead Photovoltaic Power Forecasts with Non-Parametric and Parametric Distributions. J. Electr. Eng. Technol. 13 , 1504–1514, https://doi.org/10.5370/jeet.2018.13.4.1504 (2018).

Mei, F. et al . Day-Ahead Nonparametric Probabilistic Forecasting of Photovoltaic Power Generation Based on the LSTM-QRA Ensemble Model. IEEE Access 8 , 166138–166149, https://doi.org/10.1109/ACCESS.2020.3021581 (2020).

Foucquier, A., Robert, S., Suard, F., Stéphan, L. & Jay, A. State of the art in building modelling and energy performances prediction: A review. Renewable and Sustainable Energy Reviews 23 , 272–288, https://doi.org/10.1016/j.rser.2013.03.004 (2013).

Wei, Y. et al . A review of data-driven approaches for prediction and classification of building energy consumption. Renewable and Sustainable Energy Reviews 82 , 1027–1047, https://doi.org/10.1016/j.rser.2017.09.108 (2018).

Ahmad, T., Chen, H., Guo, Y. & Wang, J. A comprehensive overview on the data driven and large scale based approaches for forecasting of building energy demand: A review. Energy and Buildings 165 , 301–320, https://doi.org/10.1016/j.enbuild.2018.01.017 (2018).

Wen, M. et al . Short-term load forecasting based on feature mining and deep learning of big data of user electricity consumption. AIP Advances 13 , 125315, https://doi.org/10.1063/5.0176239 (2023).

Mynhoff, P., Mocanu, E. & Gibescu, M. in 8th IEEE PES Innovative Smart Grid Technology Conference Europe .

Yildiz, B., Bilbao, J. I., Dore, J. & Sproul, A. B. Short-term forecasting of individual household electricity loads with investigating impact of data resolution and forecast horizon. Renew. Energy Environ. Sustain. 3 , 3 (2018).

Schreck, S., Comble, I. Pd. L., Thiem, S. & Niessen, S. A Methodological Framework to support Load Forecast Error Assessment in Local Energy Markets. IEEE Transactions on Smart Grid 11 , 3212–3220, https://doi.org/10.1109/TSG.2020.2971339 (2020).

Suganthi, L. & Samuel, A. A. Energy models for demand forecasting—A review. Renewable and Sustainable Energy Reviews 16 , 1223–1240, https://doi.org/10.1016/j.rser.2011.08.014 (2012).

Weron, R. Electricity price forecasting: A review of the state-of-the-art with a look into the future. International Journal of Forecasting 30 , 1030–1081, https://doi.org/10.1016/j.ijforecast.2014.08.008 (2014).

Zhang, W., He, Y. & Yang, S. Day-ahead load probability density forecasting using monotone composite quantile regression neural network and kernel density estimation. Electric Power Systems Research 201 , 107551, https://doi.org/10.1016/j.epsr.2021.107551 (2021).

Bracale, A., Caramia, P., De Falco, P. & Hong, T. A Multivariate Approach to Probabilistic Industrial Load Forecasting. Electric Power Systems Research 187 , 106430, https://doi.org/10.1016/j.epsr.2020.106430 (2020).

Bot, K., Laouali, I., Ruano, A. & Ruano, M. D. G. Home Energy Management Systems with Branch-and-Bound Model-Based Predictive Control Techniques. Energies 14 , 5852, https://doi.org/10.3390/en14185852 (2021).

Gomes, I. L. R., Ruano, M. G. & Ruano, A. E. MILP-based model predictive control for home energy management systems: A real case study in Algarve, Portugal. Energy Build. 281 , 112774, https://doi.org/10.1016/j.enbuild.2023.112774 (2023).

Bot, K., Ruano, A. & Ruano, M. G. in Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU) Vol. 1237 (eds M.-J., Lesot et al .) 313–326 (Springer International Publishing, 2020).

Bot, K., Santos, S., Laouali, I., Ruano, A. & Ruano, M. G. Design of Ensemble Forecasting Models for Home Energy Management Systems. Energies 14 , 7664, https://doi.org/10.3390/en14227664 (2021).

Gomes, I. L. R., Ruano, M. G. & Ruano, A. Minimizing the operation costs of a smart home using a HEMS with a MILP-based model predictive control approach. IFAC-PapersOnLine 56 , 8720–8725, https://doi.org/10.1016/j.ifacol.2023.10.054 (2023).

Ferreira, P. & Ruano, A. in New Advances in Intelligent Signal Processing Vol. 372 Studies in Computational Intelligence (eds A., Ruano & A., Várkonyi-Kóczy) 21-53 (Springer Berlin/Heidelberg, 2011).

Bot, K., Ruano, A. & Ruano, M. D. G. Short-Term Forecasting Photovoltaic Solar Power for Home Energy Management Systems. Inventions 6 , 1–23, https://doi.org/10.3390/inventions6010012 (2021).

Ruano, A. & Ruano, M. D. G. Designing Robust Forecasting Ensembles of Data-Driven Models with a Multi-Objective Formulation: An Application to Home Energy Management Systems. Inventions 8 , 96, https://doi.org/10.3390/inventions8040096 (2023).

Laouali, I., Ruano, A., Ruano, M. D. G., Bennani, S. D. & Fadili, H. E. Non-Intrusive Load Monitoring of Household Devices Using a Hybrid Deep Learning Model through Convex Hull-Based Data Selection. Energies 15 , 1215, https://doi.org/10.3390/en15031215 (2022).

Laouali, I. et al . Energy Disaggregation Using Multi-Objective Genetic Algorithm Designed Neural Networks. Energies 15 , 9073, https://doi.org/10.3390/en15239073 (2022).

Khosravani, H. R., Ruano, A. E. & Ferreira, P. M. A convex hull-based data selection method for data driven models. Applied Soft Computing 47 , 515–533, https://doi.org/10.1016/j.eswa.2016.06.028 (2016).

Ruano, A. E. et al . The IMBPC HVAC system: A complete MBPC solution for existing HVAC systems. Energy Build. 120 , 145–158, https://doi.org/10.1016/j.enbuild.2016.03.043 (2016).

Gomes, I. L. R., Ruano, M. G. & Ruano, A. E. From home energy management systems to communities energy managers: The use of an intelligent aggregator in a community in Algarve, Portugal. Energy Build. 298 , 113588, https://doi.org/10.1016/j.enbuild.2023.113588 (2023).

Ruano, M. D. G. & Ruano, A. A Multi-Step Ensemble Approach for Energy Community Day-Ahead Net Load Point and Probabilistic Forecasting. Energies 17 , 696, https://doi.org/10.3390/en17030696 (2024).

D’Incecco, M., Squartini, S. & Zhong, M. Transfer Learning for Non-Intrusive Load Monitoring. IEEE Transactions on Smart Grid 11 , 1419–1429, https://doi.org/10.1109/TSG.2019.2938068 (2020).

Ruano, A. & Ruano, M. G. Zenodo https://doi.org/10.5281/zenodo.8096648 (2023).

Sharp NU-AK PV panels https://www.sharp.co.uk/cps/rde/xchg/gb/hs.xsl/-/html/product-details-solar-modules-2189.htm?product=NUAK300B (2020).

Kostal Plenticore Plus Inverter https://www.kostal-solar-electric.com/en-gb/products/hybrid-inverters/plenticore-plus (2020).

BYD Battery Box HV https://www.eft-systems.de/en/The%20B-BOX/product/Battery%20Box%20HV/3 (2020).

Wibeee Consumption Analyzers http://circutor.com/en/products/measurement-and-control/fixed-power-analyzers/consumption-analyzers (2020).

Carlo Gavazzi EM340 https://www.carlogavazzi.co.uk/blog/carlo-gavazzi-energy-solutions/em340-utilises-touchscreen-technology (2020).

Kostal. Kostal Smart Energy Meter https://shop.kostal-solar-electric.com/en/kostal-smart-energy-meter.html (2020).

TP-Link WiFi Smart Plugs https://www.tp-link.com/pt/home-networking/smart-plug/hs100/ (2020).

Mestre, G. et al . An Intelligent Weather Station. Sensors 15 , 31005–31022, https://doi.org/10.3390/s151229841 (2015).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Ruano, A., Silva, S., Duarte, H. & Ferreira, P. M. Wireless Sensors and IoT Platform for Intelligent HVAC Control. Applied Sciences 8 , 370, https://doi.org/10.3390/app8030370 (2018).

Ruano, A., Bot, K. & Ruano, M. G. in CONTROLO 2020: Proceedings of the 14th APCA International Conference on Automatic Control and Soft Computing Vol. Lecture Notes in Electrical Engineering, 695 Lecture Notes in Electrical Engineering (eds Gonçalves J. A, Braz-César M., & Coelho J.P.) 332–341 (Springer Science and Business Media Deutschland GmbH, 2021).

Download references

Acknowledgements

The authors would like to acknowledge the support of Operational Program Portugal 2020 and Operational Program CRESC Algarve 2020, grant number 72581/2020. A. Ruano also acknowledges Fundação para a Ciência e a Tecnologia (FCT) for its financial support via the project LAETA Base Funding (DOI: 10.54499/UIDB/50022/2020). M.G. Ruano also acknowledges the support of Foundation for Science and Technology, I.P./MCTES through national funds (PIDDAC), within the scope of CISUC R&D Unit - UIDB/00326/2020 or project code UIDP/00326/2020

Author information

Authors and affiliations.

IDMEC, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, and Faculty of Science & Technology, University of Algarve, Faro, Portugal

Antonio Ruano

CISUC, University of Coimbra, Coimbra, and Faculty of Science & Technology, University of Algarve, Faro, Portugal

Maria da Graça Ruano

You can also search for this author in PubMed   Google Scholar

Contributions

Both authors contributed equally to this paper.

Corresponding author

Correspondence to Antonio Ruano .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Ruano, A., Ruano, M.d.G. From home energy management systems to energy communities: methods and data. Sci Data 11 , 346 (2024). https://doi.org/10.1038/s41597-024-03184-5

Download citation

Received : 06 July 2023

Accepted : 25 March 2024

Published : 06 April 2024

DOI : https://doi.org/10.1038/s41597-024-03184-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

research paper topics for database management systems

For enquiries call:

+1-469-442-0620

banner-in1

10 Current Database Research Topic Ideas in 2024

Home Blog Database 10 Current Database Research Topic Ideas in 2024

Play icon

As we head towards the second half of 2024, the world of technology evolves at a rapid pace. With the rise of AI and blockchain, the demand for data, its management and the need for security increases rapidly. A logical consequence of these changes is the way fields like database security research topics and DBMS research have come up as the need of the hour.

With new technologies and techniques emerging day-by-day, staying up-to-date with the latest trends in database research topics is crucial. Whether you are a student, researcher, or industry professional, we recommend taking our Database Certification courses to stay current with the latest research topics in DBMS.

In this blog post, we will introduce you to 10 current database research topic ideas that are likely to be at the forefront of the field in 2024. From blockchain-based database systems to real-time data processing with in-memory databases, these topics offer a glimpse into the exciting future of database research.

So, get ready to dive into the exciting world of databases and discover the latest developments in database research topics of 2024!

Blurring the Lines between Blockchains and Database Systems 

The intersection of blockchain technology and database systems offers fertile new grounds to anyone interested in database research.

As blockchain gains popularity, many thesis topics in DBMS[1] are exploring ways to integrate both fields. This research will yield innovative solutions for data management. Here are 3 ways in which these two technologies are being combined to create powerful new solutions:

Immutable Databases: By leveraging blockchain technology, it’s possible to create databases to be immutable. Once data has been added to such a database, it cannot be modified or deleted. This is particularly useful in situations where data integrity is critical, such as in financial transactions or supply chain management.

Decentralized Databases: Blockchain technology enables the creation of decentralized databases. Here data is stored on a distributed network of computers rather than in a central location. This can help to improve data security and reduce the risk of data loss or corruption.

Smart Contracts: Smart contracts are self-executing contracts with the terms of the agreement between buyer and seller being directly written into lines of code. By leveraging blockchain technology, it is possible to create smart contracts that are stored and executed on a decentralized database, making it possible to automate a wide range of business processes.

Childhood Obesity: Data Management 

Childhood obesity is a growing public health concern, with rates of obesity among children and adolescents rising around the world. To address this issue, it’s crucial to have access to comprehensive data on childhood obesity. Analyzing information on prevalence, risk factors, and interventions is a popular research topic in DBMS these days.

Effective data management is essential for ensuring that this information is collected, stored, and analyzed in a way that is useful and actionable. This is one of the hottest DBMS research paper topics. In this section, we will explore the topic of childhood obesity data management.

A key challenge to childhood obesity data management is ensuring data consistency. This is difficult as various organizations have varied methods for measuring and defining obesity. For example:

Some may use body mass index (BMI) as a measure of obesity.

Others may use waist circumference or skinfold thickness.   Another challenge is ensuring data security and preventing unauthorized access. To protect the privacy and confidentiality of individuals, it is important to ensure appropriate safeguards are in place. This calls for database security research and appropriate application.

Application of Computer Database Technology in Marketing

Leveraging data and analytics allows businesses to gain a competitive advantage in this digitized world today. With the rising demand for data, the use of computer databases in marketing has gained prominence.

The application of database capabilities in marketing has really come into its own as one of the most popular and latest research topics in DBMS[2]. In this section, we will explore how computer database technology is being applied in marketing, and the benefits this research can offer.

Customer Segmentation: Storage and analysis of customer data makes it possible to gain valuable insights. It allows businesses to identify trends in customer behavior, preferences and demographics. This information can be utilized to create highly targeted customer segments. This is how businesses can tailor their marketing efforts to specific groups of customers.

Personalization: Computer databases can be used to store and analyze customer data in real-time. In this way, businesses can personalize their marketing and offers based on individual customer preferences. This can help increase engagement and loyalty among customers, thereby driving greater revenue for businesses.

Predictive Analytics: Advanced analytics techniques such as machine learning and predictive modeling can throw light on patterns in customer behavior. This can even be used to predict their future actions. This information can be used to create more targeted marketing campaigns, and to identify opportunities for cross-selling and upselling.

Database Technology in Sports Competition Information Management

Database technology has revolutionized the way in which sports competition information is managed and analyzed. With the increasing popularity of sports around the world, there is a growing need for effective data management systems that can collect, store, and analyze large volumes of relevant data. Thus, researching database technologies[3] is vital to streamlining operations, improving decision-making, and enhancing the overall quality of events.

Sports organizations can use database technology to collect and manage a wide range of competition-related data such as: 

Athlete and team information,

competition schedules and results,

performance metrics, and

spectator feedback.

Collating this data in a distributed database lets sports organizations easily analyze and derive valuable insights. This is emerging as a key DBMS research paper topic.

Database Technology for the Analysis of Spatio-temporal Data

Spatio-temporal data refers to data which has a geographic as well as a temporal component. Meteorological readings, GPS data, and social media content are prime examples of this diverse field. This data can provide valuable insights into patterns and trends across space and time. However, its multidimensional nature makes analysis be super challenging. It’s no surprise that this has become a hot topic for distributed database research[4].

In this section, we will explore how database technology is being used to analyze spatio-temporal data, and the benefits this research offers.

Data Storage and Retrieval: Spatio-temporal data tends to be very high-volume. Advances in database technology are needed to make storage, retrieval and consumption of such information more efficient. A solution to this problem will make such data more available. It will then be easily retrievable and usable by a variety of data analytics tools.

Spatial Indexing: Database technology can create spatial indexes to enable faster queries on spatio-temporal data. This allows analysts to quickly retrieve data for specific geographic locations or areas of interest, and to analyze trends across these areas.

Temporal Querying: Distributed database research can also enable analysts to analyze data over specific time periods. This facilitates the identification of patterns over time. Ultimately, this enhances our understanding of how these patterns evolve over various seasons.

Artificial Intelligence and Database Technology

Artificial intelligence (AI) is another sphere of technology that’s just waiting to be explored. It hints at a wealth of breakthroughs which can change the entire world. It’s unsurprising that the combination of AI with database technology is such a hot topic for database research papers[5] in modern times. 

By using AI to analyze data, organizations can identify patterns and relationships that might not be apparent through traditional data analysis methods. In this section, we will explore some of the ways in which AI and database technology are being used together. We’ll also discuss the benefits that this amalgamation can offer.

Predictive Analytics: By analyzing large volumes of organizational and business data, AI can generate predictive models to forecast outcomes. For example, AI can go through customer data stored in a database and predict who is most likely to make a purchase in the near future.

Natural Language Processing: All businesses have huge, untapped wells of valuable information in the form of customer feedback and social media posts. These types of data sources are unstructured, meaning they don’t follow rigid parameters. By using natural language processing (NLP) techniques, AI can extract insights from this data. This helps organizations understand customer sentiment, preferences and needs.

Anomaly Detection: AI can be used to analyze large volumes of data to identify anomalies and outliers. Then, a second round of analysis can be done to pinpoint potential problems or opportunities. For example, AI can analyze sensor data from manufacturing equipment and detect when equipment is operating outside of normal parameters.

Data Collection and Management Techniques of a Qualitative Research Plan

Any qualitative research calls for the collection and management of empirical data. A crucial part of the research process, this step benefits from good database management techniques. Let’s explore some thesis topics in database management systems[6] to ensure the success of a qualitative research plan.

Interviews: This is one of the most common methods of data collection in qualitative research. Interviews can be conducted in person, over the phone, or through video conferencing. A standardized interview guide ensures the data collected is reliable and accurate. Relational databases, with their inherent structure, aid in this process. They are a way to enforce structure onto the interviews’ answers.

Focus Groups: Focus groups involve gathering a small group of people to discuss a particular topic. These generate rich data by allowing participants to share their views in a group setting. It is important to select participants who have knowledge or experience related to the research topic.

Observations: Observations involve observing and recording events in a given setting. These can be conducted openly or covertly, depending on the research objective and setting. To ensure that the data collected is accurate, it is important to develop a detailed observation protocol that outlines what behaviors or events to observe, how to record data, and how to handle ethical issues.

Database Technology in Video Surveillance System 

Video surveillance systems are used to monitor and secure public spaces, workplaces, even homes. With the increasing demand for such systems, it’s important to have an efficient and reliable way to store, manage and analyze the data generated. This is where database topics for research paper [7] come in.

By using database technology in video surveillance systems, it is possible to store and manage large amounts of video data efficiently. Database management systems (DBMS) can be used to organize video data in a way that is easily searchable and retrievable. This is particularly important in cases where video footage is needed as evidence in criminal investigations or court cases.

In addition to storage and management, database technology can also be used to analyze video data. For example, machine learning algorithms can be applied to video data to identify patterns and anomalies that may indicate suspicious activity. This can help law enforcement agencies and security personnel to identify and respond to potential threats more quickly and effectively.

Application of Java Technology in Dynamic Web Database Technology 

Java technology has proven its flexibility, scalability, and ease of use over the decades. This makes it widely used in the development of dynamic web database applications. In this section, we will explore research topics in DBMS[8] which seek to apply Java technology in databases.

Java Server Pages (JSP): JSP is a Java technology that is used to create dynamic web pages that can interact with databases. It allows developers to embed Java code within HTML scripts, thereby enabling dynamic web pages. These can interact with databases in real-time, and aid in data collection and maintenance.

Java Servlets: Java Servlets are Java classes used to extend the functionality of web servers. They provide a way to handle incoming requests from web browsers and generate dynamic content that can interact with databases.

Java Database Connectivity (JDBC): JDBC is a Java API that provides a standard interface for accessing databases. It allows Java applications to connect to databases. It can SQL queries to enhance, modify or control the backend database. This enables developers to create dynamic web applications.

Online Multi Module Educational Administration System Based on Time Difference Database Technology 

With the widespread adoption of remote learning post-COVID, online educational systems are gaining popularity at a rapid pace. A ubiquitous challenge these systems face is managing multiple modules across different time zones. This is one of the latest research topics in database management systems[9].

Time difference database technology is designed to handle time zone differences in online systems. By leveraging this, it’s possible to create a multi-module educational administration system that can handle users from different parts of the world, with different time zones.

This type of system can be especially useful for online universities or other educational institutions that have a global reach:

It makes it possible to schedule classes, assignments and other activities based on the user's time zone, ensuring that everyone can participate in real-time.

In addition to managing time zones, a time difference database system can also help manage student data, course materials, grades, and other important information.

Why is it Important to Study Databases?

Databases are the backbone of many modern technologies and applications, making it essential for professionals in various fields to understand how they work. Whether you're a software developer, data analyst or a business owner, understanding databases is critical to success in today's world. Here are a few reasons why it is important to study databases and more database topics for research paper should be published:

Efficient Data Management

Databases enable the efficient storage, organization, and retrieval of data. By studying databases, you can learn how to design and implement effective data management systems that can help organizations store, analyze, and use data efficiently.

Improved Decision-Making

Data is essential for making informed decisions, and databases provide a reliable source of data for analysis. By understanding databases, you can learn how to retrieve and analyze data to inform business decisions, identify trends, and gain insights.

Career Opportunities

In today's digital age, many career paths require knowledge of databases. By studying databases, you can open up new career opportunities in software development, data analysis, database administration and related fields.

Needless to say, studying databases is essential for anyone who deals with data. Whether you're looking to start a new career or enhance your existing skills, studying databases is a critical step towards success in today's data-driven world.

Final Takeaways

In conclusion, as you are interested in database technology, we hope this blog has given you some insights into the latest research topics in the field. From blockchain to AI, from sports to marketing, there are a plethora of exciting database topics for research papers that will shape the future of database technology.

As technology continues to evolve, it is essential to stay up-to-date with the latest trends in the field of databases. Our curated KnowledgeHut Database Certification Courses will help you stay ahead of the curve and develop new skills.

We hope this blog has inspired you to explore the exciting world of database research in 2024. Stay curious and keep learning!

Frequently Asked Questions (FAQs)

There are several examples of databases, with the five most common ones being:

MySQL : An open-source RDBMS used commonly in web applications.

Microsoft SQL Server : A popular RDBMS used in enterprise environments.

Oracle : A trusted commercial RDBMS famous for its high-scalability and security.

MongoDB : A NoSQL document-oriented database optimized for storing large amounts of unstructured data.

PostgreSQL : An open-source RDBMS offering advanced features like high concurrency and support for multiple data types.

Structured Query Language (SQL) is a high-level language designed to communicate with relational databases. It’s not a database in and of itself. Rather, it’s a language used to create, modify, and retrieve data from relational databases such as MySQL and Oracle.

A primary key is a column (or a set of columns) that uniquely identifies each row in a table. In technical terms, the primary key is a unique identifier of records. It’s used as a reference to establish relationships between various tables.

Profile

Spandita Hati

Spandita is a dynamic content writer who holds a master's degree in Forensics but loves to play with words and dabble in digital marketing. Being an avid travel blogger, she values engaging content that attracts, educates and inspires. With extensive experience in SEO tools and technologies, her writing interests are as varied as the articles themselves. In her leisure, she consumes web content and books in equal measure.

Avail your free 1:1 mentorship session.

Something went wrong

Upcoming Database Batches & Dates

Chat icon for mobile

67 Data Management Essay Topics & Database Research Topics

🏆 best database research topics, ✍️ data management essay topics for college, 🎓 most interesting database topics for research paper, 💡 simple data management systems essay topics.

  • Data Assets Management of LuLu Hypermarkets System
  • Database Management Systems’ Major Capabilities
  • Relational Database Management Systems in Business
  • Big Data Opportunities in Green Supply Chain Management
  • Object-Oriented and Database Management Systems Tradeoffs
  • Data Storage Management Solutions: Losses of Personal Data
  • Deli Depot Case Study: Data Analysis Management Reporting
  • Childhood Obesity: Data Management The use of electronic health records (EHR) is regarded as one of the effective ways to treat obesity in the population.
  • Technology-Assisted Reviews of Data in a Document Management System The TAR that is used in DMS falls into two major categories. These are automatic TAR and semi-automatic TAR, where the last implies the intervention of a human reviewer.
  • Why Open-Source Software Will (Or Will Not) Soon Dominate the Field of Database Management Tools The study aims at establishing whether open-source software will dominate the database field because there has been a changing trend in the business market.
  • Big Data Management Research This paper will present a literature review of three articles that examine text mining methods for quantitative analysis.
  • Health Data Management: Sharing and Saving Patient Data One of the ways to facilitate achieving the idealized environment of data sharing is developing the methods of accessing health-related information.
  • Electronic Health Record Database and Data Management Progress in modern medicine has resulted in the amount of information related to the health of patients to grow exponentially.
  • Data Management and Financial Strategies By adopting comprehensive supply chain management, businesses can maximize the three main streams in the supply chain— information flow, product flow, and money flow.
  • Policy on People Data Management Law No. (13) of 2016 is a data protection legislation that applies to all public institutions and private organizations across Qatar.
  • The Choice of a Medical Data Management System The choice of a medical data management system is critically important for any organization providing healthcare services.
  • Data Analytics and Its Application to Management The role of the collection of data and its subsequent analysis in the industry is as big as ever. Specifically, it pertains to the managerial field.
  • Modern Data Management and Organization Strategies Today, with a shrinking focus on data and analytics, a proper data management strategy is imperative to meeting business goals.
  • Data Collection and Management Techniques for a Qualitative Research Plan To conduct complete qualitative research and present a cohesive qualitative research plan, it is necessary to match the structure and topic of the study.
  • Database Management and Machine Learning Machine learning is used in science, business, industry, healthcare, education, etc. The possibilities of using machine learning technologies are constantly expanding.
  • Data Management in a Medium-Sized Business This paper will use a medium-sized business data management offering highly specialized, high-quality business development education services as an example.
  • Data Collection and Management Techniques of a Qualitative Research Plan This research paper recommends interview method in the collection of data and the application of NVivo statistical software in the management of data.
  • Big Data Fraud Management The growth of eCommerce systems has led to an increase in online transactions using credit cards and other methods of payment services.
  • Information Technology-Based Data Management in Retail The following paper discusses the specificities of data management and identifies the most apparent ethical considerations using retail as an example.
  • Data Management, Networking and Enterprise Software Enterprise software is often created “in-house” and thus has a far higher cost as compared to simply buying the software solution from another company.
  • EHR Database Management: Diabetes Prevention The data needed to prevent diabetes is usually collected throughout regular screenings conducted whenever a patient refers to a hospital, as well as by using various lab tests.
  • Big Data Usage in Supply Chain Management This paper gives a summary of the research that was conducted to understand the unique issues surrounding the use of big data in the supply chain.
  • Adopting Electronic Data Management in the Health Care Industry
  • Distributed Operating System and Infrastructure for Scientific Data Management
  • Advanced Drill Data Management Solutions Market: Growth and Forecast
  • The Changing Role of Data Management in Clinical Trials
  • Business Rules and Their Relationship to Effective Data Management
  • Class Enterprise Data Management and Administration
  • Developing Highly Scalable and Autonomic Data Management
  • Cloud Computing: Installation and Maintenance of Energy Efficient Data Management
  • Exploring, Mapping, and Data Management Integration of Habitable Environments in Astrobiology
  • Data Management: Data Warehousing and Data Mining
  • Efficient Algorithmic Techniques for Several Multidimensional Geometric Data Management and Analysis Problems
  • Data Management for Photovoltaic Power Plants Operation and Maintenance
  • Elderly Patients and Falls: Adverse Trends and Data Management
  • Data Management for Pre- and Post-Release Workforce Services
  • Epidemiological Data Management During an Outbreak of Ebola Virus Disease
  • Dealing With Identifier Variables in Data Management and Analysis
  • How Data Mining, Data Warehousing, and On-Line Transactional Databases Are Helping Solve the Data Management Predicament
  • Improving the New Data Management Technologies and Leverage
  • Integrated Process and Data Management for Healthcare Applications
  • Making Data Management Manageable: A Risk Assessment Activity for Managing Research Data
  • The Use of Temporal Database in the Data Management System
  • Multi-Cloud Data Management Using Shamir’s Secret Sharing and Quantum Byzantine Agreement Schemes
  • Data Management Is More Than Just Managing Data
  • Is Effective Data Management a Key Driver of Business Success?
  • National Data Centre and Financial Statistics Office: A Conceptual Design for Public Data Management
  • Big Data Management and Relevance of Big Data to E-Business
  • Redefining the Data Management Strategy: A Way to Leverage the Huge Chunk of Data
  • Structured Data Management Software Market in Taiwan
  • Towards Effective GML Data Management: Framework and Prototype
  • Data Management in Cloud Environments
  • Digital Communication: Enterprise Data Management
  • The Impact of Big Data on Data Management Functions
  • Analysis of Data Management Strategies at Tesco
  • The Best Data Management Tools Overview
  • What Is Data Management and Why Is It Important
  • Data Management and Use: Governance in the 21st Century
  • What Is Data Management and How Do Businesses Use It?
  • The Difference Between Data Management and Data Governance
  • Types of Data Management Systems for Data-First Marketing Strategies and Success
  • Reasons Why Data Management Leads to Business Success

Cite this post

  • Chicago (N-B)
  • Chicago (A-D)

StudyCorgi. (2022, June 5). 67 Data Management Essay Topics & Database Research Topics. https://studycorgi.com/ideas/data-management-essay-topics/

"67 Data Management Essay Topics & Database Research Topics." StudyCorgi , 5 June 2022, studycorgi.com/ideas/data-management-essay-topics/.

StudyCorgi . (2022) '67 Data Management Essay Topics & Database Research Topics'. 5 June.

1. StudyCorgi . "67 Data Management Essay Topics & Database Research Topics." June 5, 2022. https://studycorgi.com/ideas/data-management-essay-topics/.

Bibliography

StudyCorgi . "67 Data Management Essay Topics & Database Research Topics." June 5, 2022. https://studycorgi.com/ideas/data-management-essay-topics/.

StudyCorgi . 2022. "67 Data Management Essay Topics & Database Research Topics." June 5, 2022. https://studycorgi.com/ideas/data-management-essay-topics/.

These essay examples and topics on Data Management were carefully selected by the StudyCorgi editorial team. They meet our highest standards in terms of grammar, punctuation, style, and fact accuracy. Please ensure you properly reference the materials if you’re using them to write your assignment.

This essay topic collection was updated on December 27, 2023 .

Database Management Systems (DBMS)

Database group website: db.cs.berkeley.edu

Declarative languages and runtime systems

Design and implementation of declarative programming languages with applications to distributed systems, networking, machine learning, metadata management, and interactive visualization; design of query interface for applications.

Scalable data analysis and query processing

Scalable data processing in new settings, including interactive exploration, metadata management, cloud and serverless environments, and machine learning; query processing on compressed, semi-structured, and streaming data; query processing with additional constraints, including fairness, resource utilization, and cost.

Consistency, concurrency, coordination and reliability

Coordination avoidance, consistency and monotonicity analysis; transaction isolation levels and protocols; distributed analytics and data management, geo-replication; fault tolerance and fault injection.

Data storage and physical design

Hot and cold storage; immutable data structures; indexing and data skipping; versioning; new data types; implications of hardware evolution.

Metadata management

Data lineage and versioning; usage tracking and collective intelligence; scalability of metadata management services; metadata representations; reproducibility and debugging of data pipelines.

Systems for machine learning and model management

Distributed machine learning and graph analytics; physical and logical optimization of machine learning pipelines; online model management and maintenance; prediction serving; real-time personalization; latency-accuracy tradeoffs and edge computing for large-scale models; machine learning lifecycle management.

Data cleaning, data transformation, and crowdsourcing

Human-data interaction including interactive transformation, query authoring, and crowdsourcing; machine learning for data cleaning; statistical properties of data cleaning pipelines; end-to-end systems for crowdsourcing.

Interactive data exploration and visualization

Interactive querying and direct manipulation; scalable spreadsheets and data visualization; languages and interfaces for interactive exploration; progressive query visualization; predictive interaction.

Secure data processing

Data processing under homomorphic encryption; data compression and encryption; differential privacy; oblivious data processing; databases in secure hardware enclaves.

Foundations of data management

Optimal trade-offs between storage, quality, latency, and cost, with applications to crowdsourcing, distributed data management, stream data processing, version management; expressiveness, complexity, and completeness of data representations, query languages, and query processing; query processing with fairness constraints.

Research Centers

  • EPIC Data lab
  • Sky Computing Lab
  • Alvin Cheung
  • Natacha Crooks
  • Joseph Gonzalez
  • Joseph M. Hellerstein (coordinator)
  • Jiantao Jiao
  • Aditya Parameswaran
  • Matei Zaharia
  • Eric Brewer
  • Michael Lustig
  • Jelani Nelson

Faculty Awards

  • ACM Prize in Computing: Eric Brewer, 2009.
  • National Academy of Engineering (NAE) Member: Ion Stoica, 2024. Eric Brewer, 2007.
  • American Academy of Arts and Sciences Member: Eric Brewer, 2018.
  • Sloan Research Fellow: Aditya Parameswaran, 2020. Alvin Cheung, 2019. Jelani Nelson, 2017. Michael Lustig, 2013. Ion Stoica, 2003. Joseph M. Hellerstein, 1998. Eric Brewer, 1997.

Related Courses

  • CS 186. Introduction to Database Systems
  • CS 262A. Advanced Topics in Computer Systems

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

Advances in database systems education: Methods, tools, curricula, and way forward

Muhammad ishaq.

1 Department of Computer Science, National University of Computer and Emerging Sciences, Lahore, Pakistan

2 Department of Computer Science, Virtual University of Pakistan, Lahore, Pakistan

3 Department of Computer Science, University of Management and Technology, Lahore, Pakistan

Muhammad Shoaib Farooq

Muhammad faraz manzoor.

4 Department of Computer Science, Lahore Garrison University, Lahore, Pakistan

Uzma Farooq

Kamran abid.

5 Department of Electrical Engineering, University of the Punjab, Lahore, Pakistan

Mamoun Abu Helou

6 Faculty of Information Technology, Al Istiqlal University, Jericho, Palestine

Associated Data

Not Applicable.

Fundamentals of Database Systems is a core course in computing disciplines as almost all small, medium, large, or enterprise systems essentially require data storage component. Database System Education (DSE) provides the foundation as well as advanced concepts in the area of data modeling and its implementation. The first course in DSE holds a pivotal role in developing students’ interest in this area. Over the years, the researchers have devised several different tools and methods to teach this course effectively, and have also been revisiting the curricula for database systems education. In this study a Systematic Literature Review (SLR) is presented that distills the existing literature pertaining to the DSE to discuss these three perspectives for the first course in database systems. Whereby, this SLR also discusses how the developed teaching and learning assistant tools, teaching and assessment methods and database curricula have evolved over the years due to rapid change in database technology. To this end, more than 65 articles related to DSE published between 1995 and 2022 have been shortlisted through a structured mechanism and have been reviewed to find the answers of the aforementioned objectives. The article also provides useful guidelines to the instructors, and discusses ideas to extend this research from several perspectives. To the best of our knowledge, this is the first research work that presents a broader review about the research conducted in the area of DSE.

Introduction

Database systems play a pivotal role in the successful implementation of the information systems to ensure the smooth running of many different organizations and companies (Etemad & Küpçü, 2018 ; Morien, 2006 ). Therefore, at least one course about the fundamentals of database systems is taught in every computing and information systems degree (Nagataki et al., 2013 ). Database System Education (DSE) is concerned with different aspects of data management while developing software (Park et al., 2017 ). The IEEE/ACM computing curricula guidelines endorse 30–50 dedicated hours for teaching fundamentals of design and implementation of database systems so as to build a very strong theoretical and practical understanding of the DSE topics (Cvetanovic et al., 2010 ).

Practically, most of the universities offer one user-oriented course at undergraduate level that covers topics related to the data modeling and design, querying, and a limited number of hours on theory (Conklin & Heinrichs, 2005 ; Robbert & Ricardo, 2003 ), where it is often debatable whether to utilize a design-first or query-first approach. Furthermore, in order to update the course contents, some recent trends, including big data and the notion of NoSQL should also be introduced in this basic course (Dietrich et al., 2008 ; Garcia-Molina, 2008 ). Whereas, the graduate course is more theoretical and includes topics related to DB architecture, transactions, concurrency, reliability, distribution, parallelism, replication, query optimization, along with some specialized classes.

Researchers have designed a variety of tools for making different concepts of introductory database course more interesting and easier to teach and learn interactively (Brusilovsky et al., 2010 ) either using visual support (Nagataki et al., 2013 ), or with the help of gamification (Fisher & Khine, 2006 ). Similarly, the instructors have been improvising different methods to teach (Abid et al., 2015 ; Domínguez & Jaime, 2010 ) and evaluate (Kawash et al., 2020 ) this theoretical and practical course. Also, the emerging and hot topics such as cloud computing and big data has also created the need to revise the curriculum and methods to teach DSE (Manzoor et al., 2020 ).

The research in database systems education has evolved over the years with respect to modern contents influenced by technological advancements, supportive tools to engage the learners for better learning, and improvisations in teaching and assessment methods. Particularly, in recent years there is a shift from self-describing data-driven systems to a problem-driven paradigm that is the bottom-up approach where data exists before being designed. This mainly relies on scientific, quantitative, and empirical methods for building models, while pushing the boundaries of typical data management by involving mathematics, statistics, data mining, and machine learning, thus opening a multidisciplinary perspective. Hence, it is important to devote a few lectures to introducing the relevance of such advance topics.

Researchers have provided useful review articles on other areas including Introductory Programming Language (Mehmood et al., 2020 ), use of gamification (Obaid et al., 2020 ), research trends in the use of enterprise service bus (Aziz et al., 2020 ), and the role of IoT in agriculture (Farooq et al., 2019 , 2020 ) However, to the best of our knowledge, no such study was found in the area of database systems education. Therefore, this study discusses research work published in different areas of database systems education involving curricula, tools, and approaches that have been proposed to teach an introductory course on database systems in an effective manner. The rest of the article has been structured in the following manner: Sect.  2 presents related work and provides a comparison of the related surveys with this study. Section  3 presents the research methodology for this study. Section  4 analyses the major findings of the literature reviewed in this research and categorizes it into different important aspects. Section  5 represents advices for the instructors and future directions. Lastly, Sect.  6 concludes the article.

Related work

Systematic Literature Reviews have been found to be a very useful artifact for covering and understanding a domain. A number of interesting review studies have been found in different fields (Farooq et al., 2021 ; Ishaq et al., 2021 ). Review articles are generally categorized into narrative or traditional reviews (Abid et al., 2016 ; Ramzan et al., 2019 ), systematic literature review (Naeem et al., 2020 ) and meta reviews or mapping study (Aria & Cuccurullo, 2017 ; Cobo et al., 2012 ; Tehseen et al., 2020 ). This study presents a systematic literature review on database system education.

The database systems education has been discussed from many different perspectives which include teaching and learning methods, curriculum development, and the facilitation of instructors and students by developing different tools. For instance, a number of research articles have been published focusing on developing tools for teaching database systems course (Abut & Ozturk, 1997 ; Connolly et al., 2005 ; Pahl et al., 2004 ). Furthermore, few authors have evaluated the DSE tools by conducting surveys and performing empirical experiments so as to gauge the effectiveness of these tools and their degree of acceptance among important stakeholders, teachers and students (Brusilovsky et al., 2010 ; Nelson & Fatimazahra, 2010 ). On the other hand, some case studies have also been discussed to evaluate the effectiveness of the improvised approaches and developed tools. For example, Regueras et al. ( 2007 ) presented a case study using the QUEST system, in which e-learning strategies are used to teach the database course at undergraduate level, while, Myers and Skinner ( 1997 ) identified the conflicts that arise when theories in text books regarding the development of databases do not work on specific applications.

Another important facet of DSE research focuses on the curriculum design and evolution for database systems, whereby (Alrumaih, 2016 ; Bhogal et al., 2012 ; Cvetanovic et al., 2010 ; Sahami et al., 2011 ) have proposed solutions for improvements in database curriculum for the better understanding of DSE among the students, while also keeping the evolving technology into the perspective. Similarly, Mingyu et al. ( 2017 ) have shared their experience in reforming the DSE curriculum by adding topics related to Big Data. A few authors have also developed and evaluated different tools to help the instructors teaching DSE.

There are further studies which focus on different aspects including specialized tools for specific topics in DSE (Mcintyre et al, 1995 ; Nelson & Fatimazahra, 2010 ). For instance, Mcintyre et al. ( 1995 ) conducted a survey about using state of the art software tools to teach advanced relational database design courses at Cleveland State University. However, the authors did not discuss the DSE curricula and pedagogy in their study. Similarly, a review has been conducted by Nelson and Fatimazahra ( 2010 ) to highlight the fact that the understanding of basic knowledge of database is important for students of the computer science domain as well as those belonging to other domains. They highlighted the issues encountered while teaching the database course in universities and suggested the instructors investigate these difficulties so as to make this course more effective for the students. Although authors have discussed and analyzed the tools to teach database, the tools are yet to be categorized according to different methods and research types within DSE. There also exists an interesting systematic mapping study by Taipalus and Seppänen ( 2020 ) that focuses on teaching SQL which is a specific topic of DSE. Whereby, they categorized the selected primary studies into six categories based on their research types. They utilized directed content analysis, such as, student errors in query formulation, characteristics and presentation of the exercise database, specific or non-specific teaching approach suggestions, patterns and visualization, and easing teacher workload.

Another relevant study that focuses on collaborative learning techniques to teach the database course has been conducted by Martin et al. ( 2013 ) This research discusses collaborative learning techniques and adapted it for the introductory database course at the Barcelona School of Informatics. The motive of the authors was to introduce active learning methods to improve learning and encourage the acquisition of competence. However, the focus of the study was only on a few methods for teaching the course of database systems, while other important perspectives, including database curricula, and tools for teaching DSE were not discussed in this study.

The above discussion shows that a considerable amount of research work has been conducted in the field of DSE to propose various teaching methods; develop and test different supportive tools, techniques, and strategies; and to improve the curricula for DSE. However, to the best of our knowledge, there is no study that puts all these relevant and pertinent aspects together while also classifying and discussing the supporting methods, and techniques. This review is considerably different from previous studies. Table ​ Table1 1 highlights the differences between this study and other relevant studies in the field of DSE using ✓ and – symbol reflecting "included" and "not included" respectively. Therefore, this study aims to conduct a systematic mapping study on DSE that focuses on compiling, classifying, and discussing the existing work related to pedagogy, supporting tools, and curricula.

Comparison with other related research articles

Research methodology

In order to preserve the principal aim of this study, which is to review the research conducted in the area of database systems education, a piece of advice has been collected from existing methods described in various studies (Elberzhager et al., 2012 ; Keele et al., 2007 ; Mushtaq et al., 2017 ) to search for the relevant papers. Thus, proper research objectives were formulated, and based on them appropriate research questions and search strategy were formulated as shown in Fig.  1 .

An external file that holds a picture, illustration, etc.
Object name is 10639_2022_11293_Fig1_HTML.jpg

Research objectives

The Following are the research objectives of this study:

  • i. To find high quality research work in DSE.
  • ii. To categorize different aspects of DSE covered by other researchers in the field.
  • iii. To provide a thorough discussion of the existing work in this study to provide useful information in the form of evolution, teaching guidelines, and future research directions of the instructors.

Research questions

In order to fulfill the research objectives, some relevant research questions have been formulated. These questions along with their motivations have been presented in Table ​ Table2 2 .

Study selection results

Search strategy

The Following search string used to find relevant articles to conduct this study. “Database” AND (“System” OR “Management”) AND (“Education*” OR “Train*” OR “Tech*” OR “Learn*” OR “Guide*” OR “Curricul*”).

Articles have been taken from different sources i.e. IEEE, Springer, ACM, Science Direct and other well-known journals and conferences such as Wiley Online Library, PLOS and ArXiv. The planning for search to find the primary study in the field of DSE is a vital task.

Study selection

A total of 29,370 initial studies were found. These articles went through a selection process, and two authors were designated to shortlist the articles based on the defined inclusion criteria as shown in Fig.  2 . Their conflicts were resolved by involving a third author; while the inclusion/exclusion criteria were also refined after resolving the conflicts as shown in Table ​ Table3. 3 . Cohen’s Kappa coefficient 0.89 was observed between the two authors who selected the articles, which reflects almost perfect agreement between them (Landis & Koch, 1977 ). While, the number of papers in different stages of the selection process for all involved portals has been presented in Table ​ Table4 4 .

An external file that holds a picture, illustration, etc.
Object name is 10639_2022_11293_Fig2_HTML.jpg

Selection criteria

Title based search: Papers that are irrelevant based on their title are manually excluded in the first stage. At this stage, there was a large portion of irrelevant papers. Only 609 papers remained after this stage.

Abstract based search: At this stage, abstracts of the selected papers in the previous stage are studied and the papers are categorized for the analysis along with research approach. After this stage only 152 papers were left.

Full text based analysis: Empirical quality of the selected articles in the previous stage is evaluated at this stage. The analysis of full text of the article has been conducted. The total of 70 papers were extracted from 152 papers for primary study. Following questions are defined for the conduction of final data extraction.

Quality assessment criteria

Following are the criteria used to assess the quality of the selected primary studies. This quality assessment was conducted by two authors as explained above.

  • The study focuses on curricula, tools, approach, or assessments in DSE, the possible answers were Yes (1), No (0)
  • The study presents a solution to the problem in DSE, the possible answers to this question were Yes (1), Partially (0.5), No (0)
  • The study focuses on empirical results, Yes (1), No (0)

Score pattern of publication channels

Almost 50.00% of papers had scored more than average and 33.33% of papers had scored between the average range i.e., 2.50–3.50. Some articles with the score below 2.50 have also been included in this study as they present some useful information and were published in education-based journals. Also, these studies discuss important demography and technology based aspects that are directly related to DSE.

Threats to validity

The validity of this study could be influenced by the following factors during the literature of this publication.

Construct validity

In this study this validity identifies the primary study for research (Elberzhager et al., 2012 ). To ensure that many primary studies have been included in this literature two authors have proposed possible search keywords in multiple repetitions. Search string is comprised of different terms related to DS and education. Though, list might be incomplete, count of final papers found can be changed by the alternative terms (Ampatzoglou et al., 2013 ). IEEE digital library, Science direct, ACM digital library, Wiley Online Library, PLOS, ArXiv and Google scholar are the main libraries where search is done. We believe according to the statistics of search engines of literature the most research can be found on these digital libraries (Garousi et al., 2013 ). Researchers also searched related papers in main DS research sites (VLDB, ICDM, EDBT) in order to minimize the risk of missing important publication.

Including the papers that does not belong to top journals or conferences may reduce the quality of primary studies in this research but it indicates that the representativeness of the primary studies is improved. However, certain papers which were not from the top publication sources are included because of their relativeness wisth the literature, even though they reduce the average score for primary studies. It also reduces the possibility of alteration of results which might have caused by the improper handling of duplicate papers. Some cases of duplications were found which were inspected later whether they were the same study or not. The two authors who have conducted the search has taken the final decision to the select the papers. If there is no agreement between then there must be discussion until an agreement is reached.

Internal validity

This validity deals with extraction and data analysis (Elberzhager et al., 2012 ). Two authors carried out the data extraction and primary studies classification. While the conflicts between them were resolved by involving a third author. The Kappa coefficient was 0.89, according to Landis and Koch ( 1977 ), this value indicates almost perfect level of agreement between the authors that reduces this threat significantly.

Conclusion validity

This threat deals with the identification of improper results which may cause the improper conclusions. In this case this threat deals with the factors like missing studies and wrong data extraction (Ampatzoglou et al., 2013 ). The objective of this is to limit these factors so that other authors can perform study and produce the proper conclusions (Elberzhager et al., 2012 ).

Interpretation of results might be affected by the selection and classification of primary studies and analyzing the selected study. Previous section has clearly described each step performed in primary study selection and data extraction activity to minimize this threat. The traceability between the result and data extracted was supported through the different charts. In our point of view, slight difference based on the publication selection and misclassification would not alter the main results.

External validity

This threat deals with the simplification of this research (Mateo et al., 2012 ). The results of this study were only considered that related to the DSE filed and validation of the conclusions extracted from this study only concerns the DSE context. The selected study representativeness was not affected because there was no restriction on time to find the published research. Therefore, this external validity threat is not valid in the context of this research. DS researchers can take search string and the paper classification scheme represented in this study as an initial point and more papers can be searched and categorized according to this scheme.

Analysis of compiled research articles

This section presents the analysis of the compiled research articles carefully selected for this study. It presents the findings with respect to the research questions described in Table ​ Table2 2 .

Selection results

A total of 70 papers were identified and analyzed for the answers of RQs described above. Table ​ Table6 6 represents a list of the nominated papers with detail of the classification results and their quality assessment scores.

Classification and quality assessment of selected articles

RQ1.Categorization of research work in DSE field

The analysis in this study reveals that the literature can be categorized as: Tools: any additional application that helps instructors in teaching and students in learning. Methods: any improvisation aimed at improving pedagogy or cognition. Curriculum: refers to the course content domains and their relative importance in a degree program, as shown in Fig.  3 .

An external file that holds a picture, illustration, etc.
Object name is 10639_2022_11293_Fig3_HTML.jpg

Taxonomy of DSE study types

Most of the articles provide a solution by gathering the data and also prove the novelty of their research through results. These papers are categorized as experiments w.r.t. their research types. Whereas, some of them case study papers which are used to generate an in depth, multifaceted understanding of a complex issue in its real-life context, while few others are review studies analyzing the previously used approaches. On the other hand, a majority of included articles have evaluated their results with the help of experiments, while others conducted reviews to establish an opinion as shown in Fig.  4 .

An external file that holds a picture, illustration, etc.
Object name is 10639_2022_11293_Fig4_HTML.jpg

Cross Mapping of DSE study type and research Types

Educational tools, especially those related to technology, are making their place in market faster than ever before (Calderon et al., 2011 ). The transition to active learning approaches, with the learner more engaged in the process rather than passively taking in information, necessitates a variety of tools to help ensure success. As with most educational initiatives, time should be taken to consider the goals of the activity, the type of learners, and the tools needed to meet the goals. Constant reassessment of tools is important to discover innovation and reforms that improve teaching and learning (Irby & Wilkerson, 2003 ). For this purpose, various type of educational tools such as, interactive, web-based and game based have been introduced to aid the instructors in order to explain the topic in more effective way.

The inclusion of technology into the classroom may help learners to compete in the competitive market when approaching the start of their career. It is important for the instructors to acknowledge that the students are more interested in using technology to learn database course instead of merely being taught traditional theory, project, and practice-based methods of teaching (Adams et al., 2004 ). Keeping these aspects in view many authors have done significant research which includes web-based and interactive tools to help the learners gain better understanding of basic database concepts.

Great research has been conducted with the focus of students learning. In this study we have discussed the students learning supportive with two major finding’s objectives i.e., tools which prove to be more helpful than other tools. Whereas, proposed tools with same outcome as traditional classroom environment. Such as, Abut and Ozturk ( 1997 ) proposed an interactive classroom environment to conduct database classes. The online tools such as electronic “Whiteboard”, electronic textbooks, advance telecommunication networks and few other resources such as Matlab and World Wide Web were the main highlights of their proposed smart classroom. Also, Pahl et al. ( 2004 ) presented an interactive multimedia-based system for the knowledge and skill oriented Web-based education of database course students. The authors had differentiated their proposed classroom environment from traditional classroom-based approach by using tool mediated independent learning and training in an authentic setting. On the other hand, some authors have also evaluated the educational tools based on their usage and impact on students’ learning. For example, Brusilovsky et al. ( 2010 )s evaluated the technical and conceptual difficulties of using several interactive educational tools in the context of a single course. A combined Exploratorium has been presented for database courses and an experimental platform, which delivers modified access to numerous types of interactive learning activities.

Also, Taipalus and Perälä ( 2019 ) investigated the types of errors that are persistent in writing SQL by the students. The authors also contemplated the errors while mapping them onto different query concepts. Moreover, Abelló Gamazo et al. ( 2016 ) presented a software tool for the e-assessment of relational database skills named LearnSQL. The proposed software allows the automatic and efficient e-learning and e-assessment of relational database skills. Apart from these, Yue ( 2013 ) proposed the database tool named Sakila as a unified platform to support instructions and multiple assignments of a graduate database course for five semesters. According to this study, students find this tool more useful and interesting than the highly simplified databases developed by the instructor, or obtained from textbook. On the other hand, authors have proposed tools with the main objective to help the student’s grip on the topic by addressing the pedagogical problems in using the educational tools. Connolly et al. ( 2005 ) discussed some of the pedagogical problems sustaining the development of a constructive learning environment using problem-based learning, a simulation game and interactive visualizations to help teach database analysis and design. Also, Yau and Karim ( 2003 ) proposed smart classroom with prevalent computing technology which will facilitate collaborative learning among the learners. The major aim of this smart classroom is to improve the quality of interaction between the instructors and students during lecture.

Student satisfaction is also an important factor for the educational tools to more effective. While it supports in students learning process it should also be flexible to achieve the student’s confidence by making it as per student’s needs (Brusilovsky et al., 2010 ; Connolly et al., 2005 ; Pahl et al., 2004 ). Also, Cvetanovic et al. ( 2010 ) has proposed a web-based educational system named ADVICE. The proposed solution helps the students to reduce the gap between DBMS, theory and its practice. On the other hand, authors have enhanced the already existing educational tools in the traditional classroom environment to addressed the student’s concerns (Nelson & Fatimazahra, 2010 ; Regueras et al., 2007 ) Table ​ Table7 7 .

Tools: Adopted in DSE and their impacts

Hands on database development is the main concern in most of the institute as well as in industry. However, tools assisting the students in database development and query writing is still major concern especially in SQL (Brusilovsky et al., 2010 ; Nagataki et al., 2013 ).

Student’s grades reflect their conceptual clarity and database development skills. They are also important to secure jobs and scholarships after passing out, which is why it is important to have the educational learning tools to help the students to perform well in the exams (Cvetanovic et al., 2010 ; Taipalus et al., 2018 ). While, few authors (Wang et al., 2010 ) proposed Metube which is a variation of YouTube. Subsequently, existing educational tools needs to be upgraded or replaced by the more suitable assessment oriented interactive tools to attend challenging students needs (Pahl et al., 2004 ; Yuelan et al., 2011 ).

One other objective of developing the educational tools is to increase the interaction between the students and the instructors. In the modern era, almost every institute follows the student centered learning(SCL). In SCL the interaction between students and instructor increases with most of the interaction involves from the students. In order to support SCL the educational based interactive and web-based tools need to assign more roles to students than the instructors (Abbasi et al., 2016 ; Taipalus & Perälä, 2019 ; Yau & Karim, 2003 ).

Theory versus practice is still one of the main issues in DSE teaching methods. The traditional teaching method supports theory first and then the concepts learned in the theoretical lectures implemented in the lab. Whereas, others think that it is better to start by teaching how to write query, which should be followed by teaching the design principles for database, while a limited amount of credit hours are also allocated for the general database theory topics. This part of the article discusses different trends of teaching and learning style along with curriculum and assessments methods discussed in DSE literature.

A variety of teaching methods have been designed, experimented, and evaluated by different researchers (Yuelan et al., 2011 ; Chen et al., 2012 ; Connolly & Begg, 2006 ). Some authors have reformed teaching methods based on the requirements of modern way of delivering lectures such as Yuelan et al. ( 2011 ) reform teaching method by using various approaches e.g. a) Modern ways of education: includes multimedia sound, animation, and simulating the process and working of database systems to motivate and inspire the students. b) Project driven approach: aims to make the students familiar with system operations by implementing a project. c) Strengthening the experimental aspects: to help the students get a strong grip on the basic knowledge of database and also enable them to adopt a self-learning ability. d) Improving the traditional assessment method: the students should turn in their research and development work as the content of the exam, so that they can solve their problem on their own.

The main aim of any teaching method is to make student learn the subject effectively. Student must show interest in order to gain something from the lectures delivered by the instructors. For this, teaching methods should be interactive and interesting enough to develop the interest of the students in the subject. Students can show interest in the subject by asking more relative questions or completing the home task and assignments on time. Authors have proposed few teaching methods to make topic more interesting such as, Chen et al. ( 2012 ) proposed a scaffold concept mapping strategy, which considers a student’s prior knowledge, and provides flexible learning aids (scaffolding and fading) for reading and drawing concept maps. Also, Connolly & Begg (200s6) examined different problems in database analysis and design teaching, and proposed a teaching approach driven by principles found in the constructivist epistemology to overcome these problems. This constructivist approach is based on the cognitive apprenticeship model and project-based learning. Similarly, Domínguez & Jaime ( 2010 ) proposed an active method for database design through practical tasks development in a face-to-face course. They analyzed results of five academic years using quasi experimental. The first three years a traditional strategy was followed and a course management system was used as material repository. On the other hand, Dietrich and Urban ( 1996 ) have described the use of cooperative group learning concepts in support of an undergraduate database management course. They have designed the project deliverables in such a way that students develop skills for database implementation. Similarly, Zhang et al. ( 2018 ) have discussed several effective classroom teaching measures from the aspects of the innovation of teaching content, teaching methods, teaching evaluation and assessment methods. They have practiced the various teaching measures by implementing the database technologies and applications in Qinghai University. Moreover, Hou and Chen ( 2010 ) proposed a new teaching method based on blending learning theory, which merges traditional and constructivist methods. They adopted the method by applying the blending learning theory on Access Database programming course teaching.

Problem solving skills is a key aspect to any type of learning at any age. Student must possess this skill to tackle the hurdles in institute and also in industry. Create mind and innovative students find various and unique ways to solve the daily task which is why they are more likeable to secure good grades and jobs. Authors have been working to introduce teaching methods to develop problem solving skills in the students(Al-Shuaily, 2012 ; Cai & Gao, 2019 ; Martinez-González & Duffing, 2007 ; Gudivada et al., 2007 ). For instance, Al-Shuaily ( 2012 ) has explored four cognitive factors such as i) Novices’ ability in understanding, ii) Novices’ ability to translate, iii) Novice’s ability to write, iv) Novices’ skills that might influence SQL teaching, and learning methods and approaches. Also, Cai and Gao ( 2019 ) have reformed the teaching method in the database course of two higher education institutes in China. Skills and knowledge, innovation ability, and data abstraction were the main objective of their study. Similarly, Martinez-González and Duffing ( 2007 ) analyzed the impact of convergence of European Union (EU) in different universities across Europe. According to their study, these institutes need to restructure their degree program and teaching methodologies. Moreover, Gudivada et al. ( 2007 ) proposed a student’s learning method to work with the large datasets. they have used the Amazon Web Services API and.NET/C# application to extract a subset of the product database to enhance student learning in a relational database course.

On the other hand, authors have also evaluated the traditional teaching methods to enhance the problem-solving skills among the students(Eaglestone & Nunes, 2004 ; Wang & Chen, 2014 ; Efendiouglu & Yelken, 2010 ) Such as, Eaglestone and Nunes ( 2004 ) shared their experiences of delivering a database design course at Sheffield University and discussed some of the issues they faced, regarding teaching, learning and assessments. Likewise, Wang and Chen ( 2014 ) summarized the problems mainly in teaching of the traditional database theory and application. According to the authors the teaching method is outdated and does not focus on the important combination of theory and practice. Moreover, Efendiouglu and Yelken ( 2010 ) investigated the effects of two different methods Programmed Instruction (PI) and Meaningful Learning (ML) on primary school teacher candidates’ academic achievements and attitudes toward computer-based education, and to define their views on these methods. The results show that PI is not favoured for teaching applications because of its behavioural structure Table ​ Table8 8 .

Methods: Teaching approaches adopted in DSE

Students become creative and innovative when the try to study on their own and also from different resources rather than curriculum books only. In the modern era, there are various resources available on both online and offline platforms. Modern teaching methods must emphasize on making the students independent from the curriculum books and educate them to learn independently(Amadio et al., 2003 ; Cai & Gao, 2019 ; Martin et al., 2013 ). Also, in the work of Kawash et al. ( 2020 ) proposed he group study-based learning approach called Graded Group Activities (GGAs). In this method students team up in order to take the exam as a group. On the other hand, few studies have emphasized on course content to prepare students for the final exams such as, Zheng and Dong ( 2011 ) have discussed the issues of computer science teaching with particular focus on database systems, where different characteristics of the course, teaching content and suggestions to teach this course effectively have been presented.

As technology is evolving at rapid speed, so students need to have practical experience from the start. Basic theoretical concepts of database are important but they are of no use without its implementation in real world projects. Most of the students study in the institutes with the aim of only clearing the exams with the help of theoretical knowledge and very few students want to have practical experience(Wang & Chen, 2014 ; Zheng & Dong, 2011 ). To reduce the gap between the theory and its implementation, authors have proposed teaching methods to develop the student’s interest in the real-world projects (Naik & Gajjar, 2021 ; Svahnberg et al., 2008 ; Taipalus et al., 2018 ). Moreover, Juxiang and Zhihong ( 2012 ) have proposed that the teaching organization starts from application scenarios, and associate database theoretical knowledge with the process from analysis, modeling to establishing database application. Also, Svahnberg et al. ( 2008 ) explained that in particular conditions, there is a possibility to use students as subjects for experimental studies in DSE and influencing them by providing responses that are in line with industrial practice.

On the other hand, Nelson et al. ( 2003 ) evaluated the different teaching methods used to teach different modules of database in the School of Computing and Technology at the University of Sunder- land. They outlined suggestions for changes to the database curriculum to further integrate research and state-of-the-art systems in databases.

  • III. Curriculum

Database curriculum has been revisited many times in the form of guidelines that not only present the contents but also suggest approximate time to cover different topics. According to the ACM curriculum guidelines (Lunt et al., 2008 ) for the undergraduate programs in computer science, the overall coverage time for this course is 46.50 h distributed in such a way that 11 h is the total coverage time for the core topics such as, Information Models (4 core hours), Database Systems (3 core hours) and Data Modeling (4 course hours). Whereas, the remaining hours are allocated for elective topics such as Indexing, Relational Databases, Query Languages, Relational Database Design, Transaction Processing, Distributed Databases, Physical Database Design, Data Mining, Information Storage and Retrieval, Hypermedia, Multimedia Systems, and Digital Libraries(Marshall, 2012 ). While, according to the ACM curriculum guidelines ( 2013 ) for undergraduate programs in computer science, this course should be completed in 15 weeks with two and half hour lecture per week and lab session of four hours per week on average (Brady et al., 2004 ). Thus, the revised version emphasizes on the practice based learning with the help of lab component. Numerous organizations have exerted efforts in this field to classify DSE (Dietrich et al., 2008 ). DSE model curricula, bodies of knowledge (BOKs), and some standardization aspects in this field are discussed below:

Model curricula

There are standard bodies who set the curriculum guidelines for teaching undergraduate degree programs in computing disciplines. Curricula which include the guidelines to teach database are: Computer Engineering Curricula (CEC) (Meier et al., 2008 ), Information Technology Curricula (ITC) (Alrumaih, 2016 ), Computing Curriculum Software Engineering (CCSE) (Meyer, 2001 ), Cyber Security Curricula (CSC) (Brady et al., 2004 ; Bishop et al., 2017 ).

Bodies of knowledge (BOK)

A BOK includes the set of thoughts and activities related to the professional area, while in model curriculum set of guidelines are given to address the education issues (Sahami et al., 2011 ). Database body of Knowledge comprises of (a) The Data Management Body of Knowledge (DM- BOK), (b) Software Engineering Education Knowledge (SEEK) (Sobel, 2003 ) (Sobel, 2003 ), and (c) The SE body of knowledge (SWEBOK) (Swebok Evolution: IEEE Computer Society n.d. ).

Apart from the model curricula, and bodies of knowledge, there also exist some standards related to the database and its different modules: ISO/IEC 9075–1:2016 (Computing Curricula, 1991 ), ISO/IEC 10,026–1: 1998 (Suryn, 2003 ).

We also utilize advices from some studies (Elberzhager et al., 2012 ; Keele et al., 2007 ) to search for relevant papers. In order to conduct this systematic study, it is essential to formulate the primary research questions (Mushtaq et al., 2017 ). Since the data management techniques and software are evolving rapidly, the database curriculum should also be updated accordingly to meet these new requirements. Some authors have described ways of updating the content of courses to keep pace with specific developments in the field and others have developed new database curricula to keep up with the new data management techniques.

Furthermore, some authors have suggested updates for the database curriculum based on the continuously evolving technology and introduction of big data. For instance Bhogal et al. ( 2012 ) have shown that database curricula need to be updated and modernized, which can be achieved by extending the current database concepts that cover the strategies to handle the ever changing user requirements and how database technology has evolved to meet the requirements. Likewise, Picciano ( 2012 ) examines the evolving world of big data and analytics in American higher education. According to the author, the “data driven” decision making method should be used to help the institutes evaluate strategies that can improve retention and update the curriculum that has big data basic concepts and applications, since data driven decision making has already entered in the big data and learning analytic era. Furthermore, Marshall ( 2011 ) presented the challenges faced when developing a curriculum for a Computer Science degree program in the South African context that is earmarked for international recognition. According to the author, the Curricula needs to adhere both to the policy and content requirements in order to be rated as being of a particular quality.

Similarly, some studies (Abourezq & Idrissi, 2016 ; Mingyu et al., 2017 ) described big data influence from a social perspective and also proceeded with the gaps in database curriculum of computer science, especially, in the big data era and discovers the teaching improvements in practical and theoretical teaching mode, teaching content and teaching practice platform in database curriculum. Also Silva et al. ( 2016 ) propose teaching SQL as a general language that can be used in a wide range of database systems from traditional relational database management systems to big data systems.

On the other hand, different authors have developed a database curriculum based on the different academic background of students. Such as, Dean and Milani ( 1995 ) have recommended changes in computer science curricula based on the practice in United Stated Military Academy (USMA). They emphasized greatly on the practical demonstration of the topic rather than the theoretical explanation. Especially, for the non-computer science major students. Furthermore, Urban and Dietrich ( 2001 ) described the development of a second course on database systems for undergraduates, preparing students for the advanced database concepts that they will exercise in the industry. They also shared their experience with teaching the course, elaborating on the topics and assignments. Also, Andersson et al. ( 2019 ) proposed variations in core topics of database management course for the students with the engineering background. Moreover, Dietrich et al. ( 2014 ) described two animations developed with images and color that visually and dynamically introduce fundamental relational database concepts and querying to students of many majors. The goal is that the educators, in diverse academic disciplines, should be able to incorporate these animations in their existing courses to meet their pedagogical needs.

The information systems have evolved into large scale distributed systems that store and process a huge amount of data across different servers, and process them using different distributed data processing frameworks. This evolution has given birth to new paradigms in database systems domain termed as NoSQL and Big Data systems, which significantly deviate from conventional relational and distributed database management systems. It is pertinent to mention that in order to offer a sustainable and practical CS education, these new paradigms and methodologies as shown in Fig.  5 should be included into database education (Kleiner, 2015 ). Tables ​ Tables9 9 and ​ and10 10 shows the summarized findings of the curriculum based reviewed studies. This section also proposed appropriate text book based on the theory, project, and practice-based teaching methodology as shown in Table ​ Table9. 9 . The proposed books are selected purely on the bases of their usage in top universities around the world such as, Massachusetts Institute of Technology, Stanford University, Harvard University, University of Oxford, University of Cambridge and, University of Singapore and the coverage of core topics mentioned in the database curriculum.

An external file that holds a picture, illustration, etc.
Object name is 10639_2022_11293_Fig5_HTML.jpg

Concepts in Database Systems Education (Kleiner, 2015 )

Recommended text books for DSE

Curriculum: Findings of Reviewed Literature

RQ.2 Evolution of DSE research

This section discusses the evolution of database while focusing the DSE over the past 25 years as shown in Fig.  6 .

An external file that holds a picture, illustration, etc.
Object name is 10639_2022_11293_Fig6_HTML.jpg

Evolution of DSE studies

This study shows that there is significant increase in research in DSE after 2004 with 78% of the selected papers are published after 2004. The main reason of this outcome is that some of the papers are published in well-recognized channels like IEEE Transactions on Education, ACM Transactions on Computing Education, International Conference on Computer Science and Education (ICCSE), and Teaching, Learning and Assessment of Database (TLAD) workshop. It is also evident that several of these papers were published before 2004 and only a few articles were published during late 1990s. This is because of the fact that DSE started to gain interest after the introduction of Body of Knowledge and DSE standards. The data intensive scientific discovery has been discussed as the fourth paradigm (Hey et al., 2009 ): where the first involves empirical science and observations; second contains theoretical science and mathematically driven insights; third considers computational science and simulation driven insights; while the fourth involves data driven insights of modern scientific research.

Over the past few decades, students have gone from attending one-room class to having the world at their fingertips, and it is a great challenge for the instructors to develop the interest of students in learning database. This challenge has led to the development of the different types of interactive tools to help the instructors teach DSE in this technology oriented era. Keeping the importance of interactive tools in DSE in perspective, various authors have proposed different interactive tools over the years, such as during 1995–2003, when different authors proposed various interactive tools. Some studies (Abut & Ozturk, 1997 ; Mcintyre et al., 1995 ) introduced state of the art interactive tools to teach and enhance the collaborative learning among the students. Similarly, during 2004–2005 more interactive tools in the field of DSE were proposed such as Pahl et al. ( 2004 ), Connolly et al. ( 2005 ) introduced multimedia system based interactive model and game based collaborative learning environment.

The Internet has started to become more common in the first decade of the twenty-first century and its positive impact on the education sector was undeniable. Cost effective, student teacher peer interaction, keeping in touch with the latest information were the main reasons which made the instructors employ web-based tools to teach database in the education sector. Due to this spike in the demand of web-based tools, authors also started to introduce new instruments to assist with teaching database. In 2007 Regueras et al. ( 2007 ) proposed an e-learning tool named QUEST with a feedback module to help the students to learn from their mistakes. Similarly, in 2010, multiple authors have proposed and evaluated various web-based tools. Cvetanovic et al. ( 2010 ) proposed ADVICE with the functionality to monitor student’s progress, while, few authors (Wang et al., 2010 ) proposed Metube which is a variation of YouTube. Furthermore, Nelson and Fatimazahra ( 2010 ) evaluated different web-based tools to highlight the complexities of using these web-based instruments.

Technology has changed the teaching methods in the education sector but technology cannot replace teachers, and despite the amount of time most students spend online, virtual learning will never recreate the teacher-student bond. In the modern era, innovation in technology used in educational sectors is not meant to replace the instructors or teaching methods.

During the 1990s some studies (Dietrich & Urban, 1996 ; Urban & Dietrich, 1997 ) proposed learning and teaching methods respectively keeping the evolving technology in view. The highlight of their work was project deliverables and assignments where students progressively advanced to a step-by-step extension, from a tutorial exercise and then attempting more difficult extension of assignment.

During 2002–2007 various authors have discussed a number of teaching and learning methods to keep up the pace with the ever changing database technology, such as Connolly and Begg ( 2006 ) proposing a constructive approach to teach database analysis and design. Similarly, Prince and Felder ( 2006 ) reviewed the effectiveness of inquiry learning, problem based learning, project-based learning, case-based teaching, discovery learning, and just-in-time teaching. Also, McIntyre et al. (Mcintyre et al., 1995 ) brought to light the impact of convergence of European Union (EU) in different universities across Europe. They suggested a reconstruction of teaching and learning methodologies in order to effectively teach database.

During 2008–2013 more work had been done to address the different methods of teaching and learning in the field of DSE, like the work of Dominguez and Jaime ( 2010 ) who proposed an active learning approach. The focus of their study was to develop the interest of students in designing and developing databases. Also, Zheng and Dong ( 2011 ) have highlighted various characteristics of the database course and its teaching content. Similarly, Yuelan et al. ( 2011 ) have reformed database teaching methods. The main focus of their study were the Modern ways of education, project driven approach, strengthening the experimental aspects, and improving the traditional assessment method. Likewise, Al-Shuaily ( 2012 ) has explored 4 cognitive factors that can affect the learning process of database. The main focus of their study was to facilitate the students in learning SQL. Subsequently, Chen et al. ( 2012 ) also proposed scaffolding-based concept mapping strategy. This strategy helps the students to better understand database management courses. Correspondingly, Martin et al. ( 2013 ) discussed various collaborative learning techniques in the field of DSE while keeping database as an introductory course.

In the years between 2014 and 2021, research in the field of DSE increased, which was the main reason that the most of teaching, learning and assessment methods were proposed and discussed during this period. Rashid and Al-Radhy ( 2014 ) discussed the issues of traditional teaching, learning, assessing methods of database courses at different universities in Kurdistan and the main focus of their study being reformation issues, such as absence of teaching determination and contradiction between content and theory. Similarly, Wang and Chen ( 2014 ) summarized the main problems in teaching the traditional database theory and its application. Curriculum assessment mode was the main focus of their study. Eaglestone and Nunes ( 2004 ) shared their experiences of delivering a databases design course at Sheffield University. Their focus of study included was to teach the database design module to a diverse group of students from different backgrounds. Rashid ( 2015 ) discussed some important features of database courses, whereby reforming the conventional teaching, learning, and assessing strategies of database courses at universities were the main focus of this study. Kui et al. ( 2018 ) reformed the teaching mode of database courses based on flipped classroom. Initiative learning of database courses was their main focus in this study. Similarly, Zhang et al. ( 2018 ) discussed several effective classroom teaching measures. The main focus of their study was teaching content, teaching methods, teaching evaluation and assessment methods. Cai and Gao ( 2019 ) also carried out the teaching reforms in the database course of liberal arts. Diversified teaching modes, such as flipping classroom, case oriented teaching and task oriented were the focus of their study. Teaching Kawash et al. ( 2020 ) proposed a learning approach called Graded Group Activities (GGAs). Their main focus of the study was reforming learning and assessment method.

Database course covers several topics that range from data modeling to data implementation and examination. Over the years, various authors have given their suggestions to update these topics in database curriculum to meet the requirements of modern technologies. On the other hand, authors have also proposed a new curriculum for the students of different academic backgrounds and different areas. These reformations in curriculum helped the students in their preparation, practically and theoretically, and enabled them to compete in the competitive market after graduation.

During 2003 and 2006 authors have proposed various suggestions to update and develop computer science curriculum across different universities. Robbert and Ricardo ( 2003 ) evaluated three reviews from 1999 to 2002 that were given to the groups of educators. The focus of their study was to highlight the trends that occurred in database curriculum. Also, Calero et al. ( 2003 ) proposed a first draft for this Database Body of Knowledge (DBBOK). Database (DB), Database Design (DBD), Database Administration (DBAd), Database Application (DBAp) and Advance Databases (ADVDB) were the main focus of their study. Furthermore, Conklin and Heinrichs (Conklin & Heinrichs, 2005 ) compared the content included in 13 database textbooks and the main focus of their study was IS 2002, CC2001, and CC2004 model curricula.

The years from 2007 and 2011, authors managed to developed various database curricula, like Luo et al. ( 2008 ) developed curricula in Zhejiang University City College. The aim of their study to nurture students to be qualified computer scientists. Likewise, Dietrich et al. ( 2008 ) proposed the techniques to assess the development of an advanced database course. The purpose behind the addition of an advanced database course at undergraduate level was to prepare the students to respond to industrial requirements. Also, Marshall ( 2011 ) developed a new database curriculum for Computer Science degree program in the South African context.

During 2012 and 2021 various authors suggested updates for the database curriculum such as Bhogal et al. ( 2012 ) who suggested updating and modernizing the database curriculum. Data management and data analytics were the focus of their study. Similarly, Picciano ( 2012 ) examined the curriculum in the higher level of American education. The focus of their study was big data and analytics. Also, Zhanquan et al. ( 2016 ) proposed the design for the course content and teaching methods in the classroom. Massive Open Online Courses (MOOCs) were the focus of their study. Likewise, Mingyu et al. ( 2017 ) suggested updating the database curriculum while keeping new technology concerning the database in perspective. The focus of their study was big data.

The above discussion clearly shows that the SQL is most discussed topic in the literature where more than 25% of the studies have discussed it in the previous decade as shown in Fig.  7 . It is pertinent to mention that other SQL databases such as Oracle, MS access are discussed under the SQL banner (Chen et al., 2012 ; Hou & Chen, 2010 ; Wang & Chen, 2014 ). It is mainly because of its ability to handle data in a relational database management system and direct implementation of database theoretical concepts. Also, other database topics such as transaction management, application programming etc. are also the main highlights of the topics discussed in the literature.

An external file that holds a picture, illustration, etc.
Object name is 10639_2022_11293_Fig7_HTML.jpg

Evolution of Database topics discussed in literature

Research synthesis, advice for instructors, and way forward

This section presents the synthesized information extracted after reading and analyzing the research articles considered in this study. To this end, it firstly contextualizes the tools and methods to help the instructors find suitable tools and methods for their settings. Similarly, developments in curriculum design have also been discussed. Subsequently, general advice for instructors have been discussed. Lastly, promising future research directions for developing new tools, methods, and for revising the curriculum have also been discussed in this section.

Methods, tools, and curriculum

Methods and tools.

Web-based tools proposed by Cvetanovic et al. ( 2010 ) and Wang et al. ( 2010 ) have been quite useful, as they are growing increasingly pertinent as online mode of education is prevalent all around the globe during COVID-19. On the other hand, interactive tools and smart class room methodology has also been used successfully to develop the interest of students in database class. (Brusilovsky et al., 2010 ; Connolly et al., 2005 ; Pahl et al., 2004 ; Canedo et al., 2021 ; Ko et al., 2021 ).

One of the most promising combination of methodology and tool has been proposed by Cvetanovic et al. ( 2010 ), whereby they developed a tool named ADVICE that helps students learn and implement database concepts while using project centric methodology, while a game based collaborative learning environment was proposed by Connolly et al. ( 2005 ) that involves a methodology comprising of modeling, articulation, feedback, and exploration. As a whole, project centric teaching (Connolly & Begg, 2006 ; Domínguez & Jaime, 2010 ) and teaching database design and problem solving skills Wang and Chen ( 2014 ), are two successful approaches for DSE. Whereas, other studies (Urban & Dietrich, 1997 ) proposed teaching methods that are more inclined towards practicing database concepts. While a topic specific approach has been proposed by Abbasi et al. ( 2016 ), Taipalus et al. ( 2018 ) and Silva et al. ( 2016 ) to teach and learn SQL. On the other hand, Cai and Gao ( 2019 ) developed a teaching method for students who do not have a computer science background. Lastly, some useful ways for defining assessments for DSE have been proposed by Kawash et al. ( 2020 ) and Zhang et al. ( 2018 ).

Curriculum of database adopted by various institutes around the world does not address how to teach the database course to the students who do not have a strong computer science background. Such as Marshall ( 2012 ), Luo et al. ( 2008 ) and Zhanquan et al. ( 2016 ) have proposed the updates in current database curriculum for the students who are not from computer science background. While Abid et al. ( 2015 ) proposed a combined course content and various methodologies that can be used for teaching database systems course. On the other hand, current database curriculum does not include the topics related to latest technologies in database domain. This factor was discussed by many other studies as well (Bhogal et al., 2012 ; Mehmood et al., 2020 ; Picciano, 2012 ).

Guidelines for instructors

The major conclusion of this study are the suggestions based on the impact and importance for instructors who are teaching DSE. Furthermore, an overview of productivity of every method can be provided by the empirical studies. These instructions are for instructors which are the focal audience of this study. These suggestions are subjective opinions after literature analysis in form of guidelines according to the authors and their meaning and purpose were maintained. According to the literature reviewed, various issues have been found in this section. Some other issues were also found, but those were not relevant to DSE. Following are some suggestions that provide interesting information:

Project centric and applied approach

  • To inculcate database development skills for the students, basic elements of database development need to be incorporated into teaching and learning at all levels including undergraduate studies (Bakar et al., 2011 ). To fulfill this objective, instructors should also improve the data quality in DSE by assigning the projects and assignments to the students where they can assess, measure and improve the data quality using already deployed databases. They should demonstrate that the quality of data is determined not only by the effective design of a database, but also through the perception of the end user (Mathieu & Khalil, 1997 )
  • The gap between the database course theory and industrial practice is big. Fresh graduate students find it difficult to cope up with the industrial pressure because of the contrast between what they have been taught in institutes and its application in industry (Allsopp et al., 2006 ). Involve top performers from classes in industrial projects so that they are able to acquiring sufficient knowledge and practice, especially for post graduate courses. There must be some other activities in which industry practitioners come and present the real projects and also share their industrial experiences with the students. The gap between theoretical and the practical sides of database has been identified by Myers and Skinner ( 1997 ). In order to build practical DS concepts, instructors should provide the students an accurate view of reality and proper tools.

Importance of software development standards and impact of DB in software success

  • They should have the strategies, ability and skills that can align the DSE course with the contemporary Global Software Development (GSD) (Akbar & Safdar, 2015 ; Damian et al., 2006 ).
  • Enable the students to explain the approaches to problem solving, development tools and methodologies. Also, the DS courses are usually taught in normal lecture format. The result of this method is that students cannot see the influence on the success or failure of projects because they do not realize the importance of DS activities.

Pedagogy and the use of education technology

  • Some studies have shown that teaching through play and practical activities helps to improve the knowledge and learning outcome of students (Dicheva et al., 2015 ).
  • Interactive classrooms can help the instructors to deliver their lecture in a more effective way by using virtual white board, digital textbooks, and data over network(Abut & Ozturk, 1997 ). We suggest that in order to follow the new concept of smart classroom, instructors should use the experience of Yau and Karim ( 2003 ) which benefits in cooperative learning among students and can also be adopted in DSE.
  • The instructors also need to update themselves with full spectrum of technology in education, in general, and for DSE, in particular. This is becoming more imperative as during COVID the world is relying strongly on the use of technology, particularly in education sector.

Periodic Curriculum Revision

  • There is also a need to revisit the existing series of courses periodically, so that they are able to offer the following benefits: (a) include the modern day database system concepts; (b) can be offered as a specialization track; (c) a specialized undergraduate degree program may also be designed.

DSE: Way forward

This research combines a significant work done on DSE at one place, thus providing a point to find better ways forward in order to improvise different possible dimensions for improving the teaching process of a database system course in future. This section discusses technology, methods, and modifications in curriculum would most impact the delivery of lectures in coming years.

Several tools have already been developed for effective teaching and learning in database systems. However, there is a great room for developing new tools. Recent rise of the notion of “serious games” is marking its success in several domains. Majority of the research work discussed in this review revolves around web-based tools. The success of serious games invites researchers to explore this new paradigm of developing useful tools for learning and practice database systems concepts.

Likewise, due to COVID-19 the world is setting up new norms, which are expected to affect the methods of teaching as well. This invites the researchers to design, develop, and test flexible tools for online teaching in a more interactive manner. At the same time, it is also imperative to devise new techniques for assessments, especially conducting online exams at massive scale. Moreover, the researchers can implement the idea of instructional design in web-based teaching in which an online classroom can be designed around the learners’ unique backgrounds and effectively delivering the concepts that are considered to be highly important by the instructors.

The teaching, learning and assessment methods discussed in this study can help the instructors to improve their methods in order to teach the database system course in a better way. It is noticed that only 16% of authors have the assessment methods as their focus of study, which clearly highlights that there is still plenty of work needed to be done in this particular domain. Assessment techniques in the database course will help the learners to learn from their mistakes. Also, instructors must realize that there is a massive gap between database theory and practice which can only be reduced with maximum practice and real world database projects.

Similarly, the technology is continuously influencing the development and expansion of modern education, whereas the instructors’ abilities to teach using online platforms are critical to the quality of online education.

In the same way, the ideas like flipped classroom in which students have to prepare the lesson prior to the class can be implemented on web-based teaching. This ensures that the class time can be used for further discussion of the lesson, share ideas and allow students to interact in a dynamic learning environment.

The increasing impact of big data systems, and data science and its anticipated impact on the job market invites the researchers to revisit the fundamental course of database systems as well. There is a need to extend the boundaries of existing contents by including the concepts related to distributed big data systems data storage, processing, and transaction management, with possible glimpse of modern tools and technologies.

As a whole, an interesting and long term extension is to establish a generic and comprehensive framework that engages all the stakeholders with the support of technology to make the teaching, learning, practicing, and assessing easier and more effective.

This SLR presents review on the research work published in the area of database system education, with particular focus on teaching the first course in database systems. The study was carried out by systematically selecting research papers published between 1995 and 2021. Based on the study, a high level categorization presents a taxonomy of the published under the heads of Tools, Methods, and Curriculum. All the selected articles were evaluated on the basis of a quality criteria. Several methods have been developed to effectively teach the database course. These methods focus on improving learning experience, improve student satisfaction, improve students’ course performance, or support the instructors. Similarly, many tools have been developed, whereby some tools are topic based, while others are general purpose tools that apply for whole course. Similarly, the curriculum development activities have also been discussed, where some guidelines provided by ACM/IEEE along with certain standards have been discussed. Apart from this, the evolution in these three areas has also been presented which shows that the researchers have been presenting many different teaching methods throughout the selected period; however, there is a decrease in research articles that address the curriculum and tools in the past five years. Besides, some guidelines for the instructors have also been shared. Also, this SLR proposes a way forward in DSE by emphasizing on the tools: that need to be developed to facilitate instructors and students especially post Covid-19 era, methods: to be adopted by the instructors to close the gap between the theory and practical, Database curricula update after the introduction of emerging technologies such as big data and data science. We also urge that the recognized publication venues for database research including VLDB, ICDM, EDBT should also consider publishing articles related to DSE. The study also highlights the importance of reviving the curricula, tools, and methodologies to cater for recent advancements in the field of database systems.

Data availability

Code availability, declarations.

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

  • Abbasi, S., Kazi, H., Khowaja, K., Abelló Gamazo, A., Burgués Illa, X., Casany Guerrero, M. J., Martin Escofet, C., Quer, C., Rodriguez González, M. E., Romero Moral, Ó., Urpi Tubella, A., Abid, A., Farooq, M. S., Raza, I., Farooq, U., Abid, K., Hussain, N., Abid, K., Ahmad, F., …, Yatim, N. F. M. (2016). Research trends in enterprise service bus (ESB) applications: A systematic mapping study. Journal of Informetrics, 27 (1), 217–220.
  • Abbasi, S., Kazi, H., & Khowaja, K. (2017). A systematic review of learning object oriented programming through serious games and programming approaches. 2017 4th IEEE International Conference on Engineering Technologies and Applied Sciences (ICETAS) , 1–6.
  • Abelló Gamazo A, Burgués Illa X, Casany Guerrero MJ, Martin Escofet C, Quer C, Rodriguez González ME, Romero Moral Ó, Urpi Tubella A. A software tool for E-assessment of relational database skills. International Journal of Engineering Education. 2016; 32 (3A):1289–1312. [ Google Scholar ]
  • Abid A, Farooq MS, Raza I, Farooq U, Abid K. Variants of teaching first course in database systems. Bulletin of Education and Research. 2015; 37 (2):9–25. [ Google Scholar ]
  • Abid A, Hussain N, Abid K, Ahmad F, Farooq MS, Farooq U, Khan SA, Khan YD, Naeem MA, Sabir N. A survey on search results diversification techniques. Neural Computing and Applications. 2016; 27 (5):1207–1229. [ Google Scholar ]
  • Abourezq, M., & Idrissi, A. (2016). Database-as-a-service for big data: An overview. International Journal of Advanced Computer Science and Applications (IJACSA) , 7 (1).
  • Abut, H., & Ozturk, Y. (1997). Interactive classroom for DSP/communication courses. 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing , 1 , 15–18.
  • Adams ES, Granger M, Goelman D, Ricardo C. Managing the introductory database course: What goes in and what comes out? ACM SIGCSE Bulletin. 2004; 36 (1):497–498. [ Google Scholar ]
  • Akbar, R., & Safdar, S. (2015). A short review of global software development (gsd) and latest software development trends. 2015 International Conference on Computer, Communications, and Control Technology (I4CT) , 314–317.
  • Allsopp DH, DeMarie D, Alvarez-McHatton P, Doone E. Bridging the gap between theory and practice: Connecting courses with field experiences. Teacher Education Quarterly. 2006; 33 (1):19–35. [ Google Scholar ]
  • Alrumaih, H. (2016). ACM/IEEE-CS information technology curriculum 2017: status report. Proceedings of the 1st National Computing Colleges Conference (NC3 2016) .
  • Al-Shuaily, H. (2012). Analyzing the influence of SQL teaching and learning methods and approaches. 10 Th International Workshop on the Teaching, Learning and Assessment of Databases , 3.
  • Amadio, W., Riyami, B., Mansouri, K., Poirier, F., Ramzan, M., Abid, A., Khan, H. U., Awan, S. M., Ismail, A., Ahmed, M., Ilyas, M., Mahmood, A., Hey, A. J. G., Tansley, S., Tolle, K. M., others, Tehseen, R., Farooq, M. S., Abid, A., …, Fatimazahra, E. (2003). The fourth paradigm: data-intensive scientific discovery. Innovation in Teaching and Learning in Information and Computer Sciences , 1 (1), 823–828. https://www.iso.org/standard/27614.html
  • Amadio, W. (2003). The dilemma of Team Learning: An assessment from the SQL programming classroom . 823–828.
  • Ampatzoglou A, Charalampidou S, Stamelos I. Research state of the art on GoF design patterns: A mapping study. Journal of Systems and Software. 2013; 86 (7):1945–1964. [ Google Scholar ]
  • Andersson C, Kroisandt G, Logofatu D. Including active learning in an online database management course for industrial engineering students. IEEE Global Engineering Education Conference (EDUCON) 2019; 2019 :217–220. [ Google Scholar ]
  • Aria M, Cuccurullo C. bibliometrix: An R-tool for comprehensive science mapping analysis. Journal of Informetrics. 2017; 11 (4):959–975. [ Google Scholar ]
  • Aziz O, Farooq MS, Abid A, Saher R, Aslam N. Research trends in enterprise service bus (ESB) applications: A systematic mapping study. IEEE Access. 2020; 8 :31180–31197. [ Google Scholar ]
  • Bakar MA, Jailani N, Shukur Z, Yatim NFM. Final year supervision management system as a tool for monitoring computer science projects. Procedia-Social and Behavioral Sciences. 2011; 18 :273–281. [ Google Scholar ]
  • Beecham S, Baddoo N, Hall T, Robinson H, Sharp H. Motivation in Software Engineering: A systematic literature review. Information and Software Technology. 2008; 50 (9–10):860–878. [ Google Scholar ]
  • Bhogal, J. K., Cox, S., & Maitland, K. (2012). Roadmap for Modernizing Database Curricula. 10 Th International Workshop on the Teaching, Learning and Assessment of Databases , 73.
  • Bishop, M., Burley, D., Buck, S., Ekstrom, J. J., Futcher, L., Gibson, D., ... & Parrish, A. (2017, May). Cybersecurity curricular guidelines . In IFIP World Conference on Information Security Education (pp. 3–13). Cham: Springer.
  • Brady A, Bruce K, Noonan R, Tucker A, Walker H. The 2003 model curriculum for a liberal arts degree in computer science: preliminary report. ACM SIGCSE Bulletin. 2004; 36 (1):282–283. [ Google Scholar ]
  • Brusilovsky P, Sosnovsky S, Lee DH, Yudelson M, Zadorozhny V, Zhou X. An open integrated exploratorium for database courses. AcM SIGcSE Bulletin. 2008; 40 (3):22–26. [ Google Scholar ]
  • Brusilovsky P, Sosnovsky S, Yudelson MV, Lee DH, Zadorozhny V, Zhou X. Learning SQL programming with interactive tools: From integration to personalization. ACM Transactions on Computing Education (TOCE) 2010; 9 (4):1–15. [ Google Scholar ]
  • Cai, Y., & Gao, T. (2019). Teaching Reform in Database Course for Liberal Arts Majors under the Background of" Internet Plus". 2018 6th International Education, Economics, Social Science, Arts, Sports and Management Engineering Conference (IEESASM 2018) , 208–213.
  • Calderon KR, Vij RS, Mattana J, Jhaveri KD. Innovative teaching tools in nephrology. Kidney International. 2011; 79 (8):797–799. [ PubMed ] [ Google Scholar ]
  • Calero C, Piattini M, Ruiz F. Towards a database body of knowledge: A study from Spain. ACM SIGMOD Record. 2003; 32 (2):48–53. [ Google Scholar ]
  • Canedo, E. D., Bandeira, I. N., & Costa, P. H. T. (2021). Challenges of database systems teaching amidst the Covid-19 pandemic. In 2021 IEEE Frontiers in Education Conference (FIE) (pp. 1–9). IEEE.
  • Chen H-H, Chen Y-J, Chen K-J. The design and effect of a scaffolded concept mapping strategy on learning performance in an undergraduate database course. IEEE Transactions on Education. 2012; 56 (3):300–307. [ Google Scholar ]
  • Cobo MJ, López-Herrera AG, Herrera-Viedma E, Herrera F. SciMAT: A new science mapping analysis software tool. Journal of the American Society for Information Science and Technology. 2012; 63 (8):1609–1630. [ Google Scholar ]
  • Conklin M, Heinrichs L. In search of the right database text. Journal of Computing Sciences in Colleges. 2005; 21 (2):305–312. [ Google Scholar ]
  • Connolly, T. M., & Begg, C. E. (2006). A constructivist-based approach to teaching database analysis and design. Journal of Information Systems Education , 17 (1).
  • Connolly, T. M., Stansfield, M., & McLellan, E. (2005). An online games-based collaborative learning environment to teach database design. Web-Based Education: Proceedings of the Fourth IASTED International Conference(WBE-2005) .
  • Curricula Computing. (1991). Report of the ACM/IEEE-CS Joint Curriculum Task Force. Technical Report . New York: Association for Computing Machinery.
  • Cvetanovic M, Radivojevic Z, Blagojevic V, Bojovic M. ADVICE—Educational system for teaching database courses. IEEE Transactions on Education. 2010; 54 (3):398–409. [ Google Scholar ]
  • Damian, D., Hadwin, A., & Al-Ani, B. (2006). Instructional design and assessment strategies for teaching global software development: a framework. Proceedings of the 28th International Conference on Software Engineering , 685–690.
  • Dean, T. J., & Milani, W. G. (1995). Transforming a database systems and design course for non computer science majors. Proceedings Frontiers in Education 1995 25th Annual Conference. Engineering Education for the 21st Century , 2 , 4b2--17.
  • Dicheva, D., Dichev, C., Agre, G., & Angelova, G. (2015). Gamification in education: A systematic mapping study. Journal of Educational Technology \& Society , 18 (3), 75–88.
  • Dietrich SW, Urban SD, Haag S. Developing advanced courses for undergraduates: A case study in databases. IEEE Transactions on Education. 2008; 51 (1):138–144. [ Google Scholar ]
  • Dietrich SW, Goelman D, Borror CM, Crook SM. An animated introduction to relational databases for many majors. IEEE Transactions on Education. 2014; 58 (2):81–89. [ Google Scholar ]
  • Dietrich, S. W., & Urban, S. D. (1996). Database theory in practice: learning from cooperative group projects. Proceedings of the Twenty-Seventh SIGCSE Technical Symposium on Computer Science Education , 112–116.
  • Dominguez, C., & Jaime, A. (2010). Database design learning: A project-based approach organized through a course management system. Computers \& Education , 55 (3), 1312–1320.
  • Eaglestone, B., & Nunes, M. B. (2004). Pragmatics and practicalities of teaching and learning in the quicksand of database syllabuses. Journal of Innovations in Teaching and Learning for Information and Computer Sciences , 3 (1).
  • Efendiouglu A, Yelken TY. Programmed instruction versus meaningful learning theory in teaching basic structured query language (SQL) in computer lesson. Computers & Education. 2010; 55 (3):1287–1299. [ Google Scholar ]
  • Elberzhager F, Münch J, Nha VTN. A systematic mapping study on the combination of static and dynamic quality assurance techniques. Information and Software Technology. 2012; 54 (1):1–15. [ Google Scholar ]
  • Etemad M, Küpçü A. Verifiable database outsourcing supporting join. Journal of Network and Computer Applications. 2018; 115 :1–19. [ Google Scholar ]
  • Farooq MS, Riaz S, Abid A, Abid K, Naeem MA. A Survey on the role of IoT in agriculture for the implementation of smart farming. IEEE Access. 2019; 7 :156237–156271. [ Google Scholar ]
  • Farooq MS, Riaz S, Abid A, Umer T, Zikria YB. Role of IoT technology in agriculture: A systematic literature review. Electronics. 2020; 9 (2):319. [ Google Scholar ]
  • Farooq U, Rahim MSM, Sabir N, Hussain A, Abid A. Advances in machine translation for sign language: Approaches, limitations, and challenges. Neural Computing and Applications. 2021; 33 (21):14357–14399. [ Google Scholar ]
  • Fisher, D., & Khine, M. S. (2006). Contemporary approaches to research on learning environments: Worldviews . World Scientific.
  • Garcia-Molina, H. (2008). Database systems: the complete book . Pearson Education India.
  • Garousi V, Mesbah A, Betin-Can A, Mirshokraie S. A systematic mapping study of web application testing. Information and Software Technology. 2013; 55 (8):1374–1396. [ Google Scholar ]
  • Gudivada, V. N., Nandigam, J., & Tao, Y. (2007). Enhancing student learning in database courses with large data sets. 2007 37th Annual Frontiers In Education Conference-Global Engineering: Knowledge Without Borders, Opportunities Without Passports , S2D--13.
  • Hey, A. J. G., Tansley, S., Tolle, K. M., & others. (2009). The fourth paradigm: data-intensive scientific discovery (Vol. 1). Microsoft research Redmond, WA.
  • Holliday, M. A., & Wang, J. Z. (2009). A multimedia database project and the evolution of the database course. 2009 39th IEEE Frontiers in Education Conference , 1–6.
  • Hou, S., & Chen, S. (2010). Research on applying the theory of Blending Learning on Access Database Programming Course teaching. 2010 2nd International Conference on Education Technology and Computer , 3 , V3--396.
  • Irby DM, Wilkerson L. Educational innovations in academic medicine and environmental trends. Journal of General Internal Medicine. 2003; 18 (5):370–376. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Ishaq K, Zin NAM, Rosdi F, Jehanghir M, Ishaq S, Abid A. Mobile-assisted and gamification-based language learning: A systematic literature review. PeerJ Computer Science. 2021; 7 :e496. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Joint Task Force on Computing Curricula, A. F. C. M. (acm), & Society, I. C. (2013). Computer science curricula 2013: Curriculum guidelines for undergraduate degree programs in computer science . New York, NY, USA: Association for Computing Machinery.
  • Juxiang R, Zhihong N. Taking database design as trunk line of database courses. Fourth International Conference on Computational and Information Sciences. 2012; 2012 :767–769. [ Google Scholar ]
  • Kawash, J., Jarada, T., & Moshirpour, M. (2020). Group exams as learning tools: Evidence from an undergraduate database course. Proceedings of the 51st ACM Technical Symposium on Computer Science Education , 626–632.
  • Keele, S., et al. (2007). Guidelines for performing systematic literature reviews in software engineering .
  • Kleiner, C. (2015). New Concepts in Database System Education: Experiences and Ideas. Proceedings of the 46th ACM Technical Symposium on Computer Science Education , 698.
  • Ko J, Paek S, Park S, Park J. A news big data analysis of issues in higher education in Korea amid the COVID-19 pandemic. Sustainability. 2021; 13 (13):7347. [ Google Scholar ]
  • Kui, X., Du, H., Zhong, P., & Liu, W. (2018). Research and application of flipped classroom in database course. 2018 13th International Conference on Computer Science \& Education (ICCSE) , 1–5.
  • Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics , 159–174. [ PubMed ]
  • Lunt, B., Ekstrom, J., Gorka, S., Hislop, G., Kamali, R., Lawson, E., ... & Reichgelt, H. (2008). Curriculum guidelines for undergraduate degree programs in information technology . ACM.
  • Luo, R., Wu, M., Zhu, Y., & Shen, Y. (2008). Exploration of Curriculum Structures and Educational Models of Database Applications. 2008 The 9th International Conference for Young Computer Scientists , 2664–2668.
  • Luxton-Reilly, A., Albluwi, I., Becker, B. A., Giannakos, M., Kumar, A. N., Ott, L., Paterson, J., Scott, M. J., Sheard, J., & Szabo, C. (2018). Introductory programming: a systematic literature review. Proceedings Companion of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education , 55–106.
  • Manzoor MF, Abid A, Farooq MS, Nawaz NA, Farooq U. Resource allocation techniques in cloud computing: A review and future directions. Elektronika Ir Elektrotechnika. 2020; 26 (6):40–51. doi: 10.5755/j01.eie.26.6.25865. [ CrossRef ] [ Google Scholar ]
  • Marshall, L. (2011). Developing a computer science curriculum in the South African context. CSERC , 9–19.
  • Marshall, L. (2012). A comparison of the core aspects of the acm/ieee computer science curriculum 2013 strawman report with the specified core of cc2001 and cs2008 review. Proceedings of Second Computer Science Education Research Conference , 29–34.
  • Martin C, Urpi T, Casany MJ, Illa XB, Quer C, Rodriguez ME, Abello A. Improving learning in a database course using collaborative learning techniques. The International Journal of Engineering Education. 2013; 29 (4):986–997. [ Google Scholar ]
  • Martinez-González MM, Duffing G. Teaching databases in compliance with the European dimension of higher education: Best practices for better competences. Education and Information Technologies. 2007; 12 (4):211–228. [ Google Scholar ]
  • Mateo PR, Usaola MP, Alemán JLF. Validating second-order mutation at system level. IEEE Transactions on Software Engineering. 2012; 39 (4):570–587. [ Google Scholar ]
  • Mathieu, R. G., & Khalil, O. (1997). Teaching Data Quality in the Undergraduate Database Course. IQ , 249–266.
  • Mcintyre, D. R., Pu, H.-C., & Wolff, F. G. (1995). Use of software tools in teaching relational database design. Computers \& Education , 24 (4), 279–286.
  • Mehmood E, Abid A, Farooq MS, Nawaz NA. Curriculum, teaching and learning, and assessments for introductory programming course. IEEE Access. 2020; 8 :125961–125981. [ Google Scholar ]
  • Meier, R., Barnicki, S. L., Barnekow, W., & Durant, E. (2008). Work in progress-Year 2 results from a balanced, freshman-first computer engineering curriculum. In 38th Annual Frontiers in Education Conference (pp. S1F-17). IEEE.
  • Meyer B. Software engineering in the academy. Computer. 2001; 34 (5):28–35. [ Google Scholar ]
  • Mingyu, L., Jianping, J., Yi, Z., & Cuili, Z. (2017). Research on the teaching reform of database curriculum major in computer in big data era. 2017 12th International Conference on Computer Science and Education (ICCSE) , 570–573.
  • Morien, R. I. (2006). A Critical Evaluation Database Textbooks, Curriculum and Educational Outcomes. Director , 7 .
  • Mushtaq Z, Rasool G, Shehzad B. Multilingual source code analysis: A systematic literature review. IEEE Access. 2017; 5 :11307–11336. [ Google Scholar ]
  • Myers M, Skinner P. The gap between theory and practice: A database application case study. Journal of International Information Management. 1997; 6 (1):5. [ Google Scholar ]
  • Naeem A, Farooq MS, Khelifi A, Abid A. Malignant melanoma classification using deep learning: Datasets, performance measurements, challenges and opportunities. IEEE Access. 2020; 8 :110575–110597. [ Google Scholar ]
  • Nagataki, H., Nakano, Y., Nobe, M., Tohyama, T., & Kanemune, S. (2013). A visual learning tool for database operation. Proceedings of the 8th Workshop in Primary and Secondary Computing Education , 39–40.
  • Naik, S., & Gajjar, K. (2021). Applying and Evaluating Engagement and Application-Based Learning and Education (ENABLE): A Student-Centered Learning Pedagogy for the Course Database Management System. Journal of Education , 00220574211032319.
  • Nelson, D., Stirk, S., Patience, S., & Green, C. (2003). An evaluation of a diverse database teaching curriculum and the impact of research. 1st LTSN Workshop on Teaching, Learning and Assessment of Databases, Coventry .
  • Nelson D, Fatimazahra E. Review of Contributions to the Teaching, Learning and Assessment of Databases (TLAD) Workshops. Innovation in Teaching and Learning in Information and Computer Sciences. 2010; 9 (1):78–86. [ Google Scholar ]
  • Obaid I, Farooq MS, Abid A. Gamification for recruitment and job training: Model, taxonomy, and challenges. IEEE Access. 2020; 8 :65164–65178. [ Google Scholar ]
  • Pahl C, Barrett R, Kenny C. Supporting active database learning and training through interactive multimedia. ACM SIGCSE Bulletin. 2004; 36 (3):27–31. [ Google Scholar ]
  • Park, Y., Tajik, A. S., Cafarella, M., & Mozafari, B. (2017). Database learning: Toward a database that becomes smarter every time. Proceedings of the 2017 ACM International Conference on Management of Data , 587–602.
  • Picciano AG. The evolution of big data and learning analytics in American higher education. Journal of Asynchronous Learning Networks. 2012; 16 (3):9–20. [ Google Scholar ]
  • Prince MJ, Felder RM. Inductive teaching and learning methods: Definitions, comparisons, and research bases. Journal of Engineering Education. 2006; 95 (2):123–138. [ Google Scholar ]
  • Ramzan M, Abid A, Khan HU, Awan SM, Ismail A, Ahmed M, Ilyas M, Mahmood A. A review on state-of-the-art violence detection techniques. IEEE Access. 2019; 7 :107560–107575. [ Google Scholar ]
  • Rashid, T. A., & Al-Radhy, R. S. (2014). Transformations to issues in teaching, learning, and assessing methods in databases courses. 2014 IEEE International Conference on Teaching, Assessment and Learning for Engineering (TALE) , 252–256.
  • Rashid, T. (2015). Investigation of instructing reforms in databases. International Journal of Scientific \& Engineering Research , 6 (8), 64–72.
  • Regueras, L. M., Verdú, E., Verdú, M. J., Pérez, M. A., & De Castro, J. P. (2007). E-learning strategies to support databases courses: a case study. First International Conference on Technology, Training and Communication .
  • Robbert MA, Ricardo CM. Trends in the evolution of the database curriculum. ACM SIGCSE Bulletin. 2003; 35 (3):139–143. [ Google Scholar ]
  • Sahami, M., Guzdial, M., McGettrick, A., & Roach, S. (2011). Setting the stage for computing curricula 2013: computer science--report from the ACM/IEEE-CS joint task force. Proceedings of the 42nd ACM Technical Symposium on Computer Science Education , 161–162.
  • Sciore E. SimpleDB: A simple java-based multiuser syst for teaching database internals. ACM SIGCSE Bulletin. 2007; 39 (1):561–565. [ Google Scholar ]
  • Shebaro B. Using active learning strategies in teaching introductory database courses. Journal of Computing Sciences in Colleges. 2018; 33 (4):28–36. [ Google Scholar ]
  • Sibia, N., & Liut, M. (2022, June). The Positive Effects of using Reflective Prompts in a Database Course. In 1st International Workshop on Data Systems Education (pp. 32–37).
  • Silva, Y. N., Almeida, I., & Queiroz, M. (2016). SQL: From traditional databases to big data. Proceedings of the 47th ACM Technical Symposium on Computing Science Education , 413–418.
  • Sobel, A. E. K. (2003). Computing Curricula--Software Engineering Volume. Proc. of the Final Draft of the Software Engineering Education Knowledge (SEEK) .
  • Suryn, W., Abran, A., & April, A. (2003). ISO/IEC SQuaRE: The second generation of standards for software product quality .
  • Svahnberg, M., Aurum, A., & Wohlin, C. (2008). Using students as subjects-an empirical evaluation. Proceedings of the Second ACM-IEEE International Symposium on Empirical Software Engineering and Measurement , 288–290.
  • Swebok evolution: IEEE Computer Society. (n.d.). In IEEE Computer Society SWEBOK Evolution Comments . Retrieved March 24, 2021 https://www.computer.org/volunteering/boards-and-committees/professional-educational-activities/software-engineering-committee/swebok-evolution
  • Taipalus T, Seppänen V. SQL education: A systematic mapping study and future research agenda. ACM Transactions on Computing Education (TOCE) 2020; 20 (3):1–33. [ Google Scholar ]
  • Taipalus T, Siponen M, Vartiainen T. Errors and complications in SQL query formulation. ACM Transactions on Computing Education (TOCE) 2018; 18 (3):1–29. [ Google Scholar ]
  • Taipalus, T., & Perälä, P. (2019). What to expect and what to focus on in SQL query teaching. Proceedings of the 50th ACM Technical Symposium on Computer Science Education , 198–203.
  • Tehseen R, Farooq MS, Abid A. Earthquake prediction using expert systems: A systematic mapping study. Sustainability. 2020; 12 (6):2420. [ Google Scholar ]
  • Urban, S. D., & Dietrich, S. W. (2001). Advanced database concepts for undergraduates: experience with teaching a second course. Proceedings of the Thirty-Second SIGCSE Technical Symposium on Computer Science Education , 357–361.
  • Urban SD, Dietrich SW. Integrating the practical use of a database product into a theoretical curriculum. ACM SIGCSE Bulletin. 1997; 29 (1):121–125. [ Google Scholar ]
  • Wang, J., & Chen, H. (2014). Research and practice on the teaching reform of database course. International Conference on Education Reform and Modern Management, ERMM .
  • Wang, J. Z., Davis, T. A., Westall, J. M., & Srimani, P. K. (2010). Undergraduate database instruction with MeTube. Proceedings of the Fifteenth Annual Conference on Innovation and Technology in Computer Science Education , 279–283.
  • Yau, G., & Karim, S. W. (2003). Smart classroom: Enhancing collaborative learning using pervasive computing technology. II American Society… .
  • Yue K-B. Using a semi-realistic database to support a database course. Journal of Information Systems Education. 2013; 24 (4):327. [ Google Scholar ]
  • Yuelan L, Yiwei L, Yuyan H, Yuefan L. Study on teaching methods of database application courses. Procedia Engineering. 2011; 15 :5425–5428. [ Google Scholar ]
  • Zhang, X., Wang, X., Liu, Z., Xue, W., & ZHU, X. (2018). The Exploration and Practice on the Classroom Teaching Reform of the Database Technologies Course in colleges. 2018 3rd International Conference on Modern Management, Education Technology, and Social Science (MMETSS 2018) , 320–323.
  • Zhanquan W, Zeping Y, Chunhua G, Fazhi Z, Weibin G. Research of database curriculum construction under the environment of massive open online courses. International Journal of Educational and Pedagogical Sciences. 2016; 10 (12):3873–3877. [ Google Scholar ]
  • Zheng, Y., & Dong, J. (2011). Teaching reform and practice of database principles. 2011 6th International Conference on Computer Science \& Education (ICCSE) , 1460–1462.

CSE 5249 - Research Topics in Database Management Systems

Time: Tuesdays & Thursdays, 3:55PM - 4:50PM Room: Dreese Labs 295

Instructor: Spyros Blanas, [lastname][email protected], office hours by appointment.

I will notify all students via e-mail when this webpage is updated and I will list every update here:

SEP 28 UPDATE: Small changes about who leads each paper discussion in the schedule.

NOV 3 UPDATE: Added paper summaries for remaining papers; please find instructions below. Assigned presentation slots.

Course description

This seminar focuses on recent research results in the intersection of data management and systems. There is no formal textbook for this course. We will mostly be reading and discussing recently published papers in venues such as SIGMOD, VLDB and ICDE. An important component of the course is an individual research project, where you will pick one topic of interest in the area of database management systems and explore it in depth.

This course mainly discusses the latest research findings on data management and builds on the foundations that have been introduced in the CSE 5242, the Advanced Database Management Systems course. If you are not motivated to study and conduct independent research, this course does not have a structure to guide you to success (such as a textbook, exams, or help from a GTA).

  • Ph.D. students in any group who need this background for their research.
  • Ph.D. or M.Sc. students who intend to work in the Data Management & Mining group towards a Ph.D. dissertation or an M.Sc. thesis.
  • M.Sc. or B.Sc. students that have done very well in CSE 5242 and are curious about recent research topics in data management. Many students in this category take CSE 5249 after they have accepted a job offer that involves building a data management system or service and want to learn about ideas that have not yet appeared in mainstream products.

Check Carmen for PDF versions of the papers.

Prerequisites

This course builds on the material introduced in CSE 3242/5242, the Advanced Database Management Systems course.

This course has two main components, as follows:

Paper summaries

In order to make the most of our in-class time, you are expected to submit a summary of the assigned reading before each class. For all questions, don't paraphrase (or copy verbatim) what is written in the paper. Papers frequently have different contributions than their authors claimed when they were writing them.

Paper summaries will be graded on a scale from zero to two. Zero is reserved for summaries that have not been submitted or are unreadable. One reflects a summary that can be improved, either for length, clarity or insight. Two represents a solid effort at summarizing the paper. One bonus point will be given to a few exceptionally insightful summaries.

Each summary must answer exactly the following questions. Remember that summaries are graded on clarity and insight, and not their length!

Answers to all questions are due by 1am on the day the paper is discussed . Upload your answers to Carmen as a single plaintext file. Please include the questions in the submitted file. No Microsoft Word or Adobe Acrobat files will be accepted.

Class project

You will also work in an individual research project at a topic of mutual research interest. (Group projects will not be allowed.) I can provide a list of ideas on interesting topics and discuss about any ideas you have.

It is your responsibility to meet with the instructor periodically throughout the semester to discuss the general direction and the progress of the class project. You must take the initiative to actively explore the topic you choose, or else you will not accomplish much in the project. As a consequence, your class project grade will be adversely impacted.

  • What is the problem you are solving?
  • What have others done already to solve this or a similar problem?
  • What is your solution, and what did you accomplish during the last three months?
  • What are the results? Does your solution improve over what prior work has already accomplished?
  • In retrospective, what could you have done better in this project?
  • If someone else looks at this problem in the future, what are the aspects of the problem that you did not have time to explore?

Source code: Before submitting your source code, please delete any intermediate files and executable binaries. (These will not work in any other platform but your own system.) If you have worked with a large codebase (PostgreSQL, Impala, MySQL, etc.) please only submit a diff of your changes, and include a reference to what is the "base" version you modified. Examples include "PostgreSQL 9.4.0", or "Linux 3.x development branch, git commit f3f62a38ce".

If the source code is small (a few MBs), please upload it with your report on Carmen. Ohio State offers BuckeyeBox , a version of the Box file sharing service, for this purpose which you can access using your Carmen credentials. It is not necessary to use this service, as long as you include a link to your source code.

Classroom etiquette

Please do not use phones, tablets, laptops, or other non-technological distractions.

Academic conduct

CS 764 Topics in Database Management Systems

This course covers a number of advanced topics in the development of database management systems (DBMS) and the modern applications of databases. The topics discussed include query processing and optimization, advanced access methods, advanced concurrency control and recovery, parallel and distributed data systems, cloud computing for data platforms, and data processing with emerging hardware. The course material will be drawn from a number of papers in the database literature. We will cover one paper per lecture. All students are expected to read the paper before coming to the lecture.

Prerequisites: CS 564 or equivalent. If you have concerns about meeting the prerequisties, please contact the instructor.

  • Red Book : Readings in Database Systems (5th edition) - edited by Bailis, Hellerstein, and Stonebraker.
  • Cow Book : Database Management Systems (3rd edition) - by Raghu Ramakrishnan and Johannes Gehrke, McGraw Hill, 2003.

Lecture Format: Each lecture focuses on a classic or modern research paper. Students will read the paper and submit a review to https://wisc-cs764-f22.hotcrp.com before the lecture starts. Here is a sample review for the paper on join processing.

Course projects: A big component of this course is a research project. For the project, you pick a topic in the area of data management systems, and explore it in depth. Here are lists of suggested project topics created in 2020 , 2021 , and 2022 ; but you are encouraged to select a project outside of the list. The course project is a group project, and each group must be of size 2-4. Please start looking for project partners right away. The course project will include a project proposal, a short presentation at the end of the semester, and a final project report. Here are three sample projects from previous years ( sample1 , sample2 , sample3 ). The presentations are organized as a workshop. DAWN 2019 to have an idea of what it looks like. --> The project has the following deadlines:

  • Proposal due: Oct. 24
  • Presentation: Dec. 12 & 14
  • Paper submission: Dec. 19
  • CloudLab: https://www.cloudlab.us/signup.php?pid=NextGenDB (project name: NextGenDB)
  • Chameleon: https://www.chameleoncloud.org (project name: ngdb)
  • Paper review: 15%
  • Project proposal: 10%
  • Project presentation: 10%
  • Project final report: 30%

iNetTutor.com

Online Programming Lessons, Tutorials and Capstone Project guide

40 List of DBMS Project Topics and Ideas

Introduction

A Capstone project is the last project of an IT degree program. It is made up of one or more research projects in which students create prototypes, services, and/or products. The projects are organized around an issue that needs to be handled in real-world scenarios. When IT departments want to test new ideas or concepts that will be adopted into their daily operations, they implement these capstone projects within their services.

In this article, our team has compiled a list of Database Management System Project Topics and Ideas. The capstone projects listed below will assist future researchers in deciding which capstone project idea to pursue. Future researchers may find the information in this page useful in coming up with unique capstone project ideas.

  • Telemedicine Online Platform Database Design

  “Telemedicine Online Platform” is designed to allow doctors to deliver clinical support to patients remotely. Doctors can communicate with their patients in real-time for consultations, diagnoses, monitoring, and medical supply prescriptions. The project will be developed using the SDLC method by the researchers. The researchers will also compile a sample of hospital doctors and patients who will act as study participants. A panel of IT specialists will review, test, and assess the project.

  • Virtual and Remote Guidance Counselling System Database Design

Counseling is a vital component of a person’s life since it aids in the improvement of interpersonal relationships. Humans must cease ignoring this issue because it is essential for the development of mental wellness. The capstone project “Virtual and Remote Guidance Counselling System,” which covers the gap in giving counseling in stressful situations, was built for this reason. It answers to the requirement to fill in the gaps in the traditional technique and make it more effective and immersive in this way.

Virtual and Remote Guidance Counselling System Database Design - Relationship

COVID-19 Facilities Management Information System Database Design

COVID – 19 has put people in fear due to its capability of transmission when exposed to the virus. The health sectors and the government provide isolation facilities for COVID-19 patients to mitigate the spread and transmission of the virus. However, proper communication for the availability of the facilities is inefficient resulting to surge of patients in just one facility and some are transferred multiple times due to unavailability. The COVID-19 respondents must have an advance tools to manage the COVID-19 facilities where respondents can easily look for available facilities to cater more patients.

  • Document Tracking System Database Design

The capstone project, “Document Tracking System” is purposely designed for companies and organizations that allow them to electronically store and track documents. The system will track the in/out of the documents across different departments. The typical way of tracking documents is done using the manual approach. The staff will call or personally ask for updates about the documents which are time-consuming and inefficient.

  • Face Recognition Application Database Design

Technology has grown so fast; it changes the way we do our daily tasks. Technology has made our daily lives easier. The capstone project, entitled “Face Recognition Attendance System” is designed to automate checking and recording of students’ attendance during school events using face recognition technology. The system will work by storing the student’s information along with their photographs in a server and the system will detect the faces of the students during school events and match it and verify to record the presence or absence of the student.

Face Recognition Application Database Design - List of Tables

  • Digital Wallet Solution Database Design

The capstone project, named “Digital Wallet Solution,” is intended to allow people to store money online and make payments online. The digital wallet transactions accept a variety of currencies and provide a variety of payment gateways via which the user can pay for products and services. The system allows users to conduct secure and convenient online financial transactions. It will speed up payment and other financial processes, reducing the amount of time and effort required to complete them.

  • Virtual Online Tour Application Database Design

The usage of technology is an advantage in the business industry, especially during this challenging pandemic. It allows businesses to continue to operate beyond physicality. The capstone project entitled “Virtual Online Tour Application” is designed as a platform to streamline virtual tours for clients. Any business industry can use the system to accommodate and provide their clients with a virtual experience of their business. For example, the tourist industry and real estate agencies can use the system to provide a virtual tour to their clients about the tourist locations and designs of properties, respectively.

  • Invoice Management System Database Design

The researchers will create a system that will make it easier for companies to manage and keep track of their invoice information. The company’s sales records, payables, and total invoice records will all be electronically managed using this project. Technology is highly used for business operations and transactions automation. The capstone project, entitled “Invoice Management System” is designed to automate the management of the company’s invoice records. The said project will help companies to have an organized, accurate, and reliable record that will help them track their sales and finances.

Invoice Management System Database Design - List of Tables

  • Vehicle Repair and Maintenance Management System Database Design

Information Technology has become an integral part of any kind of business in terms of automating business operations and transactions. The capstone project, entitled “Vehicle Repair and Maintenance Management System” is designed for vehicle repair and maintenance management automation. The said project will automate the vehicle garage’s operations and daily transactions. The system will automate operations such as managing vehicle repair and maintenance records, invoice records, customer records, transaction records, billing and payment records, and transaction records.

  • Transcribe Medical Database Design

Information technology has made everything easier and simpler, including transcribing the medical diagnosis of patients. The capstone project, entitled “Medical Transcription Platform,” is designed to allow medical transcriptionists to transcribe audio of medical consultations and diagnose patients in a centralized manner. A medical transcriptionist is vital to keep accurate and credible medical records of patients and can be used by other doctors to know the patients’ medical history. The said project will serve as a platform where transcribed medical audios are stored for safekeeping and easy retrieval.

  • Multi-branch Travel Agency and Booking System Database Design

The capstone project, entitled “Multi-Branch Travel Agency and Booking System,” is designed as a centralized platform wherein multiple travel agency branches are registered to ease and simplify inquiries and booking of travels and tour packages by clients. The said project will allow travel agencies to operate a business in an easy, fast manner considering the convenience and safety of their clients. The system will enable travel agencies and their clients to have a seamless online transaction.

  • Pharmacy Stocks Management Database Design

The capstone project “Pharmacy Stocks Management System” allows pharmacies to manage and monitor their stocks of drugs electronically. The Pharmacy Stocks Management System will automate inventory to help ensure that the pharmacy has enough stock of medications and supplies to serve the needs of the patients.

  • Loan Management with SMS Database Design

The capstone project entitled “ Loan Management System with SMS ” is an online platform that allows members to apply and request loan. In addition, they can also monitor their balance in their respective dashboard. Management of cooperative will review first the application for approval or disapproval of the request. Notification will be send through the SMS or short messaging service feature of the system.

Loan Management System with SMS Database Design - List of Tables

  • Service Call Management System Database Design

The capstone project, entitled ” Service Call Management System,” is designed to transform service calls to a centralized platform. The said project would allow clients to log in and lodge calls to the tech support if they encountered issues and difficulties with their purchased products. The tech support team will diagnose the issue and provide them with the necessary actions to perform via a call to solve the problem and achieve satisfaction.

  • File Management with Approval Process Database Design

The File Management System provides a platform for submitting, approving, storing, and retrieving files. Specifically, the capstone project is for the file management of various business organizations. This is quite beneficial in the management and organization of the files of every department. Installation of the system on an intranet is possible, as is uploading the system to a live server, from which the platform can be viewed online and through the use of a browser.

  • Beauty Parlor Management System Database Design

The capstone project entitled “Beauty Parlour Management System” is an example of transactional processing system that focuses on the records and process of a beauty parlour. This online application will help the management to keep and manage their transactions in an organize, fast and efficient manner.

  • Exam Management System Database Design

Information technology plays a significant role in the teaching and learning process of teachers and students, respectively. IT offers a more efficient and convenient way for teachers and students to learn and assess learnings. The capstone project, “Exam Management System,” is designed to allow electronic management of all the information about the exam questions, courses and subjects, and teachers and students. The said project is an all-in-one platform for student exam management.

Exam Management System Database Design - List of Tables

  • Student and Faculty Clearance Database Design

The capstone project, entitled “Student and Faculty Clearance System,” is designed to automate students and faculty clearance processes. The approach is intended to make the clearance procedure easier while also guaranteeing that approvals are accurate and complete. The project works by giving every Department involved access to the application. The proposed scheme can eliminate the specified challenges, streamline the process, and verify the integrity and correctness of the data.

  • Vehicle Parking Management System Database Design

The capstone project entitled “ Vehicle Parking Management System ” is an online platform that allows vehicle owners to request or reserve a slot for parking space. Management can accept and decline the request of reservation. In addition, payment option is also part of the system feature but is limited to on-site payment.

  • Hospital Resources and Room Utilization Database Design

The capstone project, “Hospital Resources and Room Utilization Management System” is a system designed to streamline the process of managing hospital resources and room utilization. The said project is critical especially now that we are facing a pandemic, there is a need for efficient management of hospital resources and room management. The management efficiency will prevent a shortage in supplies and overcrowding of patients in the hospitals.

Hospital Resources and Room Utilization Database Design

  • Church Event Management System Database Design

The capstone project entitled “Church Event Management System” is designed to be used by church organizations in creating and managing different church events. The conventional method of managing church events is done manually where members of organizations will face difficulties due to physical barriers and time constraints.

  • CrowdFunding Platform Database Design

Business financing is critical for new business ventures. In this study, the researchers concentrate on designing and developing a business financing platform that is effective for new startups. This capstone project, entitled “Crowdfunding Platform” is a website that allows entrepreneurs to campaign their new business venture to attract investors and crowdfund.

  • Vehicle Franchising and Drivers Offense Software Database Design

The proposed software will be used to electronically process and manage vehicle and franchising and driver’s offenses. The proposed software will eliminate the manual method which involves a lot of paper works and consumes valuable amount of time. The proposed project will serve as a centralized platform was recording and paying for the offenses committed by the drivers will be processed. The system will quicken the process of completing transaction between the enforcers and the drivers. Vehicle franchising and managing driver offenses will be easy, fast and convenient using the system.

  • Student Tracking Performance Database Design

The capstone project entitled “Student Academic Performance Tracking and Monitoring System” allows academic institutions to monitor and gather data about the academic performance of students where decisions are derived to further improve the students learning outcomes. Tracking and monitoring student’s performance serves a vital role in providing information that is used to assist students, teachers, administrators, and policymakers in making decisions that will further improve the academic performance of students.

  • Webinar Course Management System Database Design

The capstone project, entitled “Webinar Course Management System,” is designed to automate managing webinar courses. The project aims to eliminate the current method, which is inefficient and inconvenient for parties involved in the webinar. A software development life cycle (SDLC) technique will be used by the researchers in order to build this project. They will gather a sample size of participating webinar members and facilitators to serve as respondents of the study.

  • Online Birth Certificate Processing System with SMS Notification Database Design

The capstone project, “Online Birth Certificate Processing System with SMS Notification “ is an IT-based solution that aims to automate the process of requesting, verifying, and approving inquiries for original birth records. The system will eliminate the traditional method and transition the birth certificate processing into an easy, convenient, and efficient manner. The researchers will develop the project following the Software Development Life Cycle (SDLC) technique.

  • Food Donation Services Database Design

Information technology plays a significant role in automating the operations of many companies to boost efficiency. One of these is the automation of food donation and distribution management. “Food Donation Services,” the capstone project, is intended to serve as a platform for facilitating transactions between food groups, donors, and recipients. Food banks will be able to respond to various food donations and food assistance requests in a timely and effective manner as a result of the project.

  • COVID Profiling Database Design

The capstone project “City COVID-19 Profiling System with Decision Support” is designed to automate the process of profiling COVID-19 patients. The project will empower local health officers in electronically recording and managing COVID-19 patient information such as symptoms, travel history, and other critical details needed to identify patients. Manual profiling is prone to human mistakes, necessitates a lot of paperwork, and needs too much time and effort from the employees.

  • Evacuation Center Database Design

Calamities can have a significant impact on society. It may result in an enormous number of people being evacuated. The local government unit assigned evacuation centers to provide temporary shelter for people during disasters. Evacuation centers are provided to give temporary shelter for the people during and after a calamity. Evacuation centers can be churches, sports stadium community centers, and much more that are capable to provide emergency shelter.

  • QR Code Fare Payment System Database Design

The capstone project, “QR Code Fare Payment System” is designed to automate the procedure of paying for a fare when riding a vehicle. Passengers will register in the system to receive their own QR code, which they will use to pay for their fares by scanning in the system’s QR code scanning page. The project will enable cashless fare payment.

  • Web Based Psychopathology Diagnosis System Database Design

The capstone project entitled “Web-Based Psychopathology Diagnosis System” is designed for patients and medical staff in the field of psychopathology. The system will be a centralized platform to be used by patients and psychopathologists for consultations. The said project will also keep all the records electronically. Mental health is important. Each individual must give importance to their mental health by paying attention to it and seek medical advice if symptoms of mental disorders and unusual behavior occur.

  • Service Marketplace System Database Design

The capstone project, “Services Marketplace System” is designed to serve as a centralized platform for marketing and inquiring about different services. The system will serve as a platform where different service providers and customers will have an automated transaction. Technology made it easier for people to accomplish daily tasks and activities. In the conventional method, customers avail themselves of services by visiting the shop that offers their desired services personally.

40 List of DBMS Project Topics and Ideas

  • Fish Catch System Database Design

The capstone project, entitled “Fish Catch Monitoring System” will automate the process of recording and monitoring fish catches. The said project is intended to be used by fisherman and fish markets to accurately record fish catches and will also keep the records electronically safe and secure.

  • Complaints Handling Management System Free Template Database Design

The capstone project, “Complaint Handling Management System” is a system designed to help educational institutions to handle and manage complaints electronically. The system will improve the response time of the school’s management in addressing the complaints of the students, parents, staff, and other stakeholders.

  • Senior Citizen Information System Free Template Database Design

The system will replace the manual method of managing information and records of the senior citizen to an electronic one. The system will serve as a repository of the record of the senior citizen within the scope of a specific local government unit. By using the system, paper works will be lessened and human errors in file handling will be avoided. The system is efficient enough to aid in managing and keeping the records of the senior citizens in the different barangay.

  • Online and SMS-Based Salary Notification Database Design

The “Online and SMS Based Salary Notification” is a capstone project intended to be used by companies and employees to automate the process of notifying salary details. The application will work by allowing the designated company encoder to encode details of salary and the employees to log in to his/her account in the application and have access to the details of his/her salary. One of the beauties of being employed is being paid. Employers manage the employee’s salary and are responsible to discuss with the employees the system of the salary and deductions.

  • Maternal Records Management Database Design

The capstone project, “Maternal Records Management System” is a system that automates the process of recording and keeping maternal records. The said project will allow maternity clinics to track and monitor their patients’ records from pregnancy to their baby’s immunization records.

  • Online Complaint Management System Database Design

Online Complaint Management System is a capstone project that is design to serve as a platform to address complaints and resolve disputes. The system provides an online way of resolving problems faced by the public or people within the organization. The system will make complaints easier to coordinate, monitor, track and resolve.

  • Online Donation Database Design

The capstone project ,  “Online Donation Platform for DSWD” is an online platform for giving and asking donations in the Department of Social Welfare and Development (DSWD). The system will be managed by the staffs of the DSWD to verify donors and legible beneficiaries electronically. The system will have an SMS feature to notify the donors and beneficiaries about the status of their request.

  • OJT Timesheet Monitoring System using QR Code Database Design

The capstone project, “OJT Timesheet Monitoring System using QR Code” allows employer to automate timesheet of each trainee for easy monitoring. The system will be used by the on-the-job trainees to serve as their daily time in and out using the QR code generated by the system. The entire system will be managed by the administrator.

Technology is attributed with driving change in a wide range of enterprises and institutions. Because of information technology, the world has altered dramatically. It is difficult to imagine an industry or organization that has not benefited from technology advances. In these businesses, the most common role of IT has been to automate numerous procedures and transactions in order to increase efficiency and improve people’s overall experience and satisfaction. The aforementioned capstone project ideas will be useful in a range of sectors. It will aid in enhancing operational efficiency as well as the services provided to the project’s users.

You may visit our  Facebook page for more information, inquiries, and comments. Please subscribe also to our YouTube Channel to receive  free capstone projects resources and computer programming tutorials.

Hire our team to do the project.

Related Topics and Articles:

  • List of Completed Capstone Projects with Source code
  • 27 Free Capstone Project Ideas and Tutorials
  • 16 Lists of Free Capstone Project Ideas in Flutter
  • 39 Capstone Project Ideas for IT Related Courses
  • 50+ Free Download Web Based System Template in Bootstrap
  • COVID-19 Capstone and Research Free Project Ideas 2022
  • Capstone Project Ideas for IT and IS January 2022
  • Capstone Project Ideas for IT and IS December 2021
  • IT and IS Capstone Project Free Resources November 2021
  • List of 45 IT Capstone Project on Crime and Disaster Management

Post navigation

  • QR Code Generator in PHP Free Source code and Tutorial

Similar Articles

research paper topics for database management systems

Restaurant Food Delivery System Free Database Design Tutorial

COVID-19 Facilities Management Information System Database Design

Waste Management System Database Project

Stop Thinking, Just Do!

Sungsoo Kim's Blog

Tags Categories Archive

Sung-Soo Kim's Blog

Research topics in database management.

  • data management 287

29 January 2016

Research topics in database management systems, course description.

This seminar focuses on recent research results in the intersection of data management and systems. There is no formal textbook for this course. We will mostly be reading and discussing recently published papers in venues such as SIGMOD, VLDB and ICDE. An important component of the course is an individual research project, where you will pick one topic of interest in the area of database management systems and explore it in depth.

This course mainly discusses the latest research findings on data management and builds on the foundations that have been introduced in the CSE 5242, the Advanced Database Management Systems course. If you are not motivated to study and conduct independent research, this course does not have a structure to guide you to success (such as a textbook, exams, or help from a GTA).

Paper summaries

In order to make the most of our in-class time, you are expected to submit a summary of the assigned reading before each class. For all questions, don’t paraphrase (or copy verbatim) what is written in the paper. Papers frequently have different contributions than their authors claimed when they were writing them.

Paper summaries will be graded on a scale from zero to two. Zero is reserved for summaries that have not been submitted or are unreadable. One reflects a summary that can be improved, either for length, clarity or insight. Two represents a solid effort at summarizing the paper. One bonus point will be given to a few exceptionally insightful summaries.

Each summary must answer exactly the following questions. Remember that summaries are graded on clarity and insight, and not their length!

  • What is your name?
  • What is the paper you are summarizing?
  • What problem was this paper addressing?
  • What was the existing solution to this problem?
  • What solution was this paper proposing?
  • What are the conclusions you draw from the results?
  • List three things you appreciated when reading this paper.
  • List three things you believe can be improved in this paper.

Answers to all questions are due by 1am on the day the paper is discussed . Upload your answers to Carmen as a single plaintext file. Please include the questions in the submitted file. No Microsoft Word or Adobe Acrobat files will be accepted.

Class project

You will also work in an individual research project at a topic of mutual research interest. (Group projects will not be allowed.) I can provide a list of ideas on interesting topics and discuss about any ideas you have.

It is your responsibility to meet with the instructor periodically throughout the semester to discuss the general direction and the progress of the class project. You must take the initiative to actively explore the topic you choose, or else you will not accomplish much in the project. As a consequence, your class project grade will be adversely impacted.

Final report: The final project report should be at most twelve pages of text and figures in 11-point font. This includes any references to publications, URLs, manuals, etc. I will be looking for answers to the following questions:

  • What is the problem you are solving?
  • What have others done already to solve this or a similar problem?
  • What is your solution, and what did you accomplish during the last three months?
  • What are the results? Does your solution improve over what prior work has already accomplished?
  • In retrospective, what could you have done better in this project?
  • If someone else looks at this problem in the future, what are the aspects of the problem that you did not have time to explore?

Source code: Before submitting your source code, please delete any intermediate files and executable binaries. (These will not work in any other platform but your own system.) If you have worked with a large codebase (PostgreSQL, Impala, MySQL, etc.) please only submit a diff of your changes, and include a reference to what is the “base” version you modified. Examples include “PostgreSQL 9.4.0”, or “Linux 3.x development branch, git commit f3f62a38ce”.

If the source code is small (a few MBs), please upload it with your report on Carmen. Ohio State offers BuckeyeBox , a version of the Box file sharing service, for this purpose which you can access using your Carmen credentials. It is not necessary to use this service, as long as you include a link to your source code.

Sungsoo Kim Principal Research Scientist [email protected]

about me sungsoo's scoop sungsoo's facebook

Research Topics

Prof. Dr. Michael Grossniklaus

  • Barbara Lüthke
  • Researchers
  • Current Project Funding
  • Former Project Funding
  • Publications
  • Publication List
  • Minibase for Java
  • Student Corner
  • Projects and Theses
  • Internships

The steadily increasing informatization of society and economy produces data at a rate that has never been seen before. The volume and variety of available digital information continuously inspires new possibilities how insights can be gained by analyzing this data.

In order to realize this potential, numerous research efforts are already underway, which are typically summarized under the umbrella of data science. Data science is a field that crosscuts many research area of computer science, such as artificial intelligence, machine learning, data mining, databases, and information systems.

Our research falls into the last two of these areas and aims at supporting data science at the system level. Data science requires the management of new types of data as well as new complex ways to process it. Our research method is to address these requirements by innovating new and general solutions that leverage and extend core database and information systems technologies.

Within this broad area, our research focuses on challenges linked to data processing, in both traditional database and data stream management systems.

Graph Databases

We are currently investigating which data management technologies can be applied to what type of graph data application.

Network Data Analytics

We are interest in the analysis of large network datasets and in the detection of traits that are present among different types of networks.

Query Optimization

Phone: +49 7531 88 4434 Fax: +49 7531 88 3577

Room: PZ 806

Prof. Dr. Marc H. Scholl

Phone: +49 7531 88 4432 Fax: +49 7531 88 3577

Room: PZ 811

Search University of Konstanz

Suggestions.

Book cover

ICT with Intelligent Applications pp 465–478 Cite as

Database Management Systems—An Efficient, Effective, and Augmented Approach for Organizations

  • Anushka Sharma 7 ,
  • Aman Karamchandani 7 ,
  • Devam Dave 7 ,
  • Arush Patel 7 &
  • Nishant Doshi 7  
  • Conference paper
  • First Online: 06 December 2021

785 Accesses

1 Citations

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 248))

Big and small firms, organizations, hospitals, schools, and other commercial offices are generating moderate to huge amounts of data regularly and need to constantly update and manage these data. These data are not only used at that instance, but generally, the retrospective analysis of data helps tremendously to improve the business strategies and the marketing trends. With time, these data may grow and become unmanageable if handled conventionally, like the file system. These factors resulted in the introduction of the terms database and database management system. Hierarchical, network, relational, and object-oriented approaches of DBMS are discussed in this paper. A highlight of the new-generation database approach called NoSQL is also included in this paper along with an insight into augmented data management. A model based on the database design for the Study in India Program is discussed. It is followed by a graphical user interface developed in Java for the same which ensures the ease of access to the database.

  • Database management system
  • Augmented data management
  • Database software
  • Database in business
  • Need for DBMS
  • Future predictions of DBMS

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
  • Durable hardcover edition

Tax calculation will be finalised at checkout

Purchases are for personal use only

Silberschatz, S., Korth, H.F., Sudarshan, S.: Database system concepts

Google Scholar  

Object database page. https://en.wikipedia.org/wiki/Object_database

Ketabchi, M.A., Mathur, S., Risch, T., Chen, J.: Comparative analysis of RDBMS and OODBMS—a case study

Database Trends and Application. http://www.dbta.com/Editorial/News-Flashes/Key-Database-Trends-Now-and-for-the-Future-131888.aspx , last accessed 2019/5/21

ChristofStrauch, Prof. Walter Kriah: NoSQL Database

Dbzone webpage. https://www.dbzone.org (for figure 1,2)

NoSQL and hybrid databases. https://www.stratoscale.com/blog/dbaas/hybrid-databases-combining-relational-nosql/

Sethi, B., Mishra, S., Patnaik, P.K.: A study of NoSQL database. Int. J. Eng. Res. Technol. (IJERT) (2014)

Padhy, R.P., Patra, M.R., Satapathy, S.C.: RDBMS to NoSQL: reviewing some next-generation non-relational database's. (IJAEST) Int. J. Adv. Eng. Sci. Technol. (2011)

https://www.gartner.com/en/conferences/apac/data-analytics-india/gartner-insights/rn-top-10-data-analytics-trends/augmented-data-management

AnalyticsIndiaMag webpage. https://analyticsindiamag.com/how-startups-can-leverage-augmented-data-management-to-drive-business/,last accessed 2019/10/14

PDPU official site. https://www.pdpu.ac.in/exposure-program.html

Comparing Database Management Systems. https://www.altexsoft.com/blog/business/comparing-database-management-systems-mysql-postgresql-mssql-server-mongodb-elasticsearch-and-others . Last Accessed 20 June 2019

Download references

Acknowledgements

We would like to extend our gratitude to Prof. Nigam Dave, Head of Office of International Relations, PDPU, and Dr. Ritu Sharma, Associate Professor, PDPU, for providing insight into SIP requirements. We are immensely grateful to them for guiding us through our project and providing us with information as and when required.

Author information

Authors and affiliations.

Department of Computer Science and Engineering, School of Technology, Pandit Deendayal Energy University, Gandhinagar, India

Anushka Sharma, Aman Karamchandani, Devam Dave, Arush Patel & Nishant Doshi

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Nishant Doshi .

Editor information

Editors and affiliations.

University of the Ryukyus, Okinawa, Japan

Tomonobu Senjyu

Sinhgad Technical Education society, SKNCOE, Pune, India

Parikshit N. Mahalle

Computer Science, Faculty of CS and IT, Universiti Putra Malaysia, Seri Kembangan, Malaysia

Thinagaran Perumal

Global Knowledge Research Foundation, Ahmedabad, India

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper.

Sharma, A., Karamchandani, A., Dave, D., Patel, A., Doshi, N. (2022). Database Management Systems—An Efficient, Effective, and Augmented Approach for Organizations. In: Senjyu, T., Mahalle, P.N., Perumal, T., Joshi, A. (eds) ICT with Intelligent Applications. Smart Innovation, Systems and Technologies, vol 248. Springer, Singapore. https://doi.org/10.1007/978-981-16-4177-0_47

Download citation

DOI : https://doi.org/10.1007/978-981-16-4177-0_47

Published : 06 December 2021

Publisher Name : Springer, Singapore

Print ISBN : 978-981-16-4176-3

Online ISBN : 978-981-16-4177-0

eBook Packages : Intelligent Technologies and Robotics Intelligent Technologies and Robotics (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

DBMS for Web: The Future of Database Management

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

COMMENTS

  1. From home energy management systems to energy communities ...

    This paper introduces the HEMStoEC database, which contains data recorded in the course of two research projects, NILMforIHEM , and HEMS2IEA , for more than three years. To be manageable, the ...

  2. 19024 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on DATABASE MANAGEMENT SYSTEMS. Find methods information, sources, references or conduct a literature ...

  3. 10 Current Database Research Topic Ideas in 2024

    This is where database topics for research paper [7] come in. By using database technology in video surveillance systems, it is possible to store and manage large amounts of video data efficiently. Database management systems (DBMS) can be used to organize video data in a way that is easily searchable and retrievable.

  4. PDF Database management system performance comparisons: A systematic

    A database management system (DBMS) is an integral part of efectively all software systems, and therefore it is logical that diferent studies have compared the performance of diferent DBMSs in hopes of finding the most eỆ͓cient one. This study systematically synthesizes the results and approaches of studies that compare DBMS performance and ...

  5. 67 Data Management Essay Topics & Database Research Topics

    Interested in data management systems? 🌐 Check out this list of database research topics! Find here ideas on DBMS security and design other database topics for research papers. ... Find here ideas on DBMS security and design other database topics for research papers. Free essays. Search for: Close and clear the search form. Search. Topic ...

  6. Research Area: DBMS

    Berkeley also gave birth to many of the most widely-used open source systems in the field including INGRES, Postgres, BerkeleyDB, and Apache Spark. Today, our research continues to push the boundaries of data-centric computing, taking the foundations of data management to a broad array of emerging scenarios.

  7. Advances in database systems education: Methods, tools, curricula, and

    Research papers written in English language are included: EC: ... It is mainly because of its ability to handle data in a relational database management system and direct implementation of database theoretical concepts. Also, other database topics such as transaction management, application programming etc. are also the main highlights of the ...

  8. CSE 5249

    CSE 5249 - Research Topics in Database Management Systems. CSE 5249 - Research Topics in Database Management Systems. Time: Tuesdays & Thursdays, 3:55PM - 4:50PM Room: Dreese Labs 295. Instructor: Spyros Blanas, [lastname][email protected], office hours by appointment. I will notify all students via e-mail when this webpage is updated and I will list ...

  9. CS 764 Topics in Database Management Systems

    Students will read the paper and submit a review to https://wisc-cs764-f20.hotcrp.com before the lecture starts. Here is a sample review for the paper on join processing. Course projects: A big component of this course is a research project. For the project, you pick a topic in the area of data management systems, and explore it in detail.

  10. CS 764 Topics in Database Management Systems

    The topics discussed include query processing and optimization, advanced access methods, advanced concurrency control and recovery, parallel and distributed data systems, implications of cloud computing for data platforms, and data processing with emerging hardware. The course material will be drawn from a number of papers in the database ...

  11. Advances in Databases and Information Systems

    Needless to say, these four papers represent innovative and high quality research. The topics of these accepted papers are very timely and include: Big Data Applications and Principles, Evolving Business Intelligence Systems, Cultural Heritage Preservation and Enhancement and database evolution management.

  12. CS 764 Topics in Database Management Systems

    The topics discussed include query processing and optimization, advanced access methods, advanced concurrency control and recovery, parallel and distributed data systems, cloud computing for data platforms, and data processing with emerging hardware. The course material will be drawn from a number of papers in the database literature.

  13. (PDF) Role of Database Management Systems (DBMS) in Supporting

    This database design research model used Hevner's information systems research framework, starting with reference research, database lecture analysis, rigor and research relevance.

  14. Advances on Data Management and Information Systems

    This editorial paper overviews research topics covered in this special section of the Information Systems Frontiers journal. The special section contains papers invited from the 24 th European Conference on Advances in Databases and Information Systems (ADBIS).. 3.1 ADBIS Research Topics. The ADBIS conference has been running continuously since 1993.

  15. PDF Architecture of a Database System

    2. Upon receiving the client's first SQL command, the DBMS must assign a "thread of computation" to the command. It must also make sure that the thread's data and control out-puts are connected via the communications manager to the client. These tasks are the job of the DBMS Process Man-ager (left side of Figure 1.1).

  16. 40 List of DBMS Project Topics and Ideas

    Technology made it easier for people to accomplish daily tasks and activities. In the conventional method, customers avail themselves of services by visiting the shop that offers their desired services personally. 40 List of DBMS Project Topics and Ideas. Fish Catch System Database Design.

  17. PDF Database Management Systems: A Case Study of Faculty of Open Education

    Database systems continue to be a key aspect of Computer Science & Engineering today. Representing knowledge within a computer is one of the central challenges of the field. Database research has focused primarily on this fundamental issue (6). This paper presents a database management system developed for AOF (Faculty of Open Education) course ...

  18. Database management system performance comparisons: A systematic

    Download : Download high-res image (261KB) Download : Download full-size image Fig. 1. A simplified view of a database system and the end-user with the emphasis on components relevant to this study; the arrows represent the flow of information from the end-user's device to the database residing in persistent storage; the flow of information back to the software application is not illustrated ...

  19. Research Topics in Database Management Systems

    An important component of the course is an individual research project, where you will pick one topic of interest in the area of database management systems and explore it in depth. This course mainly discusses the latest research findings on data management and builds on the foundations that have been introduced in the CSE 5242, the Advanced ...

  20. 51044 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on DATABASE MANAGEMENT. Find methods information, sources, references or conduct a literature review on ...

  21. Research Topics

    Data science is a field that crosscuts many research area of computer science, such as artificial intelligence, machine learning, data mining, databases, and information systems. Our research falls into the last two of these areas and aims at supporting data science at the system level. Data science requires the management of new types of data ...

  22. Database Management Systems—An Efficient, Effective, and ...

    The object-oriented database management system has three main parts to it which are object structure, object classes, and object identity. The term object-oriented database management system (OODBMS) first came into play circa 1985. Several research projects have been done on the subject, with the most notable one being ORION .

  23. DBMS for Web: The Future of Database Management

    DBAs face different problems when working with many database products from various manufactures in the same company. This paper proposes a J2EE-based software developed to access any database in the market, in order to show information related to management of databases. As a case study, this program was implemented using Oracle database, versions 9i and 8i. In addition, the prototype ...

  24. (PDF) Database System: Concepts and Design

    In short, " A database is an organized collecti on of related information stored with. minimum redundancy, in a manner that makes them accessible f or multiple application". Definition : 1 ...