Review of A survey of problematic database code fragments in software systems

Content of review 1, reviewed on February 04, 2021

In this version, the authors have largely improved the organisation of the paper. In particular, section 4 is really easier to read in this version.
However, some issues remain. It is clear that I may have not raised them in my previous comments since the paper was too badly organised:

Major:
- The introduction is not clear. The problematic arrives very late and so it is difficult to understand why to talk about ORM frameworks or binding DB to software systems since we don't know where the authors want to go.
- section 1.2 is not a matter of terminology. But a description of the 4 NFR that are studied in the paper.
- in section 2, it is not clear if it is a literature review or if it is proposal. (p4 line 20 we review the literature on SQL and p4 line 40 we introduce and review low-level database problematic code fragments in terms of SQL schema antipatterns)
- the paper lacks in consistency. It is true that lots of the following issues are minor issues. But it is the repetition of such inconsistencies that makes it a major issue.
* p2 line 45 A problematic database code can be a bug, SQL smell, or SQL antipattern. BUT p4 line 20 By problematic code fragments, we mean antipatterns
in software artifacts that are symptoms resulting in eventual errors or quality problems.
* For example, p6 Antipattern names in the text vs definition in the table.
* There are some brief definitions for the antipatterns p5 and 6, but no more p7, just name.
* Other example SCA antipatterns (p7 line 42), but p3 line 6 Service Oriented Architecture (SOA)11, Microservice Architecture (MA)12, and Model View Controller (MVC)13.
* p11 line 22 we classify 11 SQL schema antipatterns. line 24: Out of the 15 studied antipatterns.
* p14 line 6 this antipattern affects the performance, maintainability, and data integrity factors of the database --> for the missing constraints antipattern, no performance impact is described.
- section 4.1 and 4.2 are referenced several times in section 2 showing a problem in the paper organisation. The absence of definition in section 2 to put them in section 4 also shows that there is a matter of organisation. My feeling is that section 2 and 4 should be merged. I will come back on this point later.
- as mentionned in my first review each section should have only one intention. --> first paragraph of section 3.4 is not relevant to the section or change the title of the section. Moreover everything is mixed up in the second paragraph of the section.
- all antipatterns should be described following the same template: 0. In the name choose if you put 'antipattern' or not but be consistent, 1. Definition, 2. From where it comes (to merge section 2 and 4) 3. Example (ALWAYS an example!! exactly as there is always an example for the design patterns), 4 Impact on the NFR. 5. Alternative solution.
- In my previous review I mentionned There should certainly be strong hypothesis, the SQL code must be correct. If it is no more a matter of antipatterns, but a bug. In the answer, the authors agreed and added that a problematic code can be an antipatterns, a smell or a bug. So a bug is not an antipattern. Consequently, listing 1 in its new version still has a bug. Moreover, code has been updated not the text. Moreover, as mentionned in the letter by the authors, Most brands of database report an error if you try to run any query that tries to return a column other than those columns named in the GROUP BY clause or as arguments to aggregate functions. It is a bug and not an antipattern.
- in section 4.4 the dependency to the frameworks is not clear.

Minor:
- p3 line 25 A database is an integral part of any software system --> there exists software system without DB.
- p4 in the text Multicolumn attribute antipattern seems to have been introduced by Sharma et al. whereas in p5 it appears in the table of Karwin antipattern.
- p5 spaghetti query antipattern is not more subjective than got table antipattern. They have a threshold that must be fixed. (Same remark p10 line 33).
- p5 spaghetti query antipattern is implemented in DBCritics
- p6 Active Record pattern suddenly appears.
- p6 the definition of the Brain repository antipattern is not clear. In fact this notion of repository is not clear for me. There is no repository in DB and in OOPL, we mostly talk about package. So do the authors refer to package? If not to what?
- p7 line 20 they --> who?
- p7 "we believe" is not very scientific.
- p7 line 37: "It is the class in a database" --> it is not clear what the authors refer to.
- p11 line 44 As discussed in the impact on performance, each vendor has its syntax --> not discussed in the impact of the performance section.
- p11 line 47 With comma-separated lists, developers cannot prevent users from entering invalid entries like a word in a language that is different from English. --> Not clear. What happens for German, Italian or Duch people, it is invalid entries if it is not in English?
- p11 definition of the adjacency list antipattern is wrong: This antipattern arises when a column in a table references another record in the same table. This is not an antipattern. For example, in a person table, you can have two attributes one for parent 1 and the other for parent 2 (in case the person has two mothers or two fathers). The antipattern is the fact to store the inheritance tree in a column. So many values in the same column. Consequently, p13 the explaination of why it is bad are wrong: the reason why it is bad is that it enforces developers to overuse joins for each decedent of the tree. In the example of person how to do on another way? The reason why it is bad is that several data are gathered in the same column and their number can vary. The impact on the performance is not clear.
- p12 please put the table in the portrait (normal) sense of the page. This table, like the others has a lot of white spaces. It can be on the other side to avoid the reader to turn his head or the page.
- superfluous key antipattern is not clear. Please put an example. Consequently the impact on data integrity is not clear. If the key is defined as PK in the RDBMS, this later won't accept duplication of row. So what is the problem here?
- section 4.1.4, it is not any constraint that is missing. Please rename the antipattern to Missing referential integrity constraints antipattern.
- section 4.1.5, not clear, example missing.
- p15 line 17 the old table and copy data from the old table to the new table --> what is the link with the described antipattern. Be careful it is the antipattern itself and not its resolution that should have an impact on any NFR. In the same way, line 22 storing those values as rows instead of columns --> ??? And finally This antipattern
also violates the Don’t Repeat Yourself (DRY) principle --> but not with the definition of the antipattern.
- p15 line 34, why the clone table antipattern has an influence on the number of columns? Moreover, the impact of the performance is strange since the sharding aims to increase performance.
- p16 line 11 As discussed in the impact on performance and portability --> the reader mostly reads from top to the bottom following the pages and the lines. Portability is after maintenance. idem line 51
- p17 line 24 why recommendation in the middle of the definition?
- p17 alternative solution of the reference nongrouped columns is not clear.
- p18 it is not clear what major in the example of the buried null antipattern refers to.
- Using UNION or DISTINCT to Remove JOIN Duplicates --> please provide example! Let consider the following tables
create table category (
idCat integer NOT NULL,
label varchar(15) not null,
PRIMARY KEY (idCat)
);
CREATE TABLE film (
idfilm integer NOT NULL,
title varchar(70) NOT NULL default '',
idCat integer constraint idCat references category(idCat),
PRIMARY KEY (idfilm)
);
If we want the labels of the categories of the films present in the database, we need to do: select distinct c.label from category c natural join film f; Where is the antipattern?
- p19 in the pattern matching predicates, it is the alternative solution that is not portable. Not the antipattern itself.
- p20 line 22 Impact on maintainability: As discussed in the impact on performance --> not discussed in the impact on performance.
- p20 line 25 Maintaining schema changes for a shared database between services is more troublesome than maintaining a database per service. --> But in that case, there can be data redundancy for example for human resources, and holidays management the table person will be duplicated if one db per service.
- p20 line 33 Definition: The active record design pattern as a persistence layer that reflects an object to a specific record in RDBMS. --> where is the verb?
- p22 line 30 they can decide that they do not these --> verb missing.
- p22 line 56 71% --> it should depend on how much data is excessive.
- p24 line 8 The author --> who?
- p25 line 44 By using a database per service pattern --> alternative solution for the Shared Persistency antipattern promotes to use one DB per service. So we solve an antipattern by adding one other. No discussion on this point.
- p31 line 4 the consequences of the misnaming is not clear for me.

Typo
- p5 TABLE 1 Classification of SQL Anitpatterns by Karwin14
- listing 12 please indent the code correctly

Source

Content of review 2, reviewed on May 05, 2021

First of all, I really want to congratulate the authors for the huge work they have done from the first version of the paper but even from the previous one.
Unfortunately, there are some few comments.
- It is true that in the new version of the introduction, the problem arrives very quickly. Two paragraphs have been added. However, these two paragraphs seem completely deconnected to the reminder of the introduction. I think that this introduction should be reworked, because normally, an introduction is structured as follows or something like this: Context; Problem; Known tracks for solutions; What the proposed solution is (so that the reader knows where the paper is going); Contribution of the paper; Paper structure. Anyway, there is a context and problem description, before the contribution description.
- p11 l44 why row duplication? It is true that there can be several rows with the same employee id and the same department id, but strictly speaking, they won't be any row duplication.
- p12 l14, the added line is useless since it idoes not exist for each pattern.
-p14 l12 If you want to add this sentence add an example. Otherwise remove it and adapt the impact description consequently.
-p17 section 3.2.4 example: You mean we have n tables and at max less than n-2 join clauses so at least one is missing since normally, if there are n tables, there are at least n-1 joins eventually more, in case of composed foreign keys?
-p17 section 3.2.4 I don't see how you can remove some rows using union. Using intersection or difference yes, but not union since an union take everything plus other things.
-p19 l36 I don't understand the added sentence
-p20 l42 verb is still missing after do. that they do not save these attributes

Source

References

Bader, A., Ramez, E., Tariq, A., Mousa, A. 2021. A survey of problematic database code fragments in software systems. Engineering Reports.

Pre-publication Review of

A survey of problematic database code fragments in software systems

Reviewed On February 04, 2021 , and May 05, 2021

Submitted to

Reviewed

Actions

Content of review 1, reviewed on February 04, 2021

Source

Content of review 2, reviewed on May 05, 2021

Source

References