[Bug 12071] New: MSI SQL joins on tables with many rows are extremely slow
wine-bugs at winehq.org
wine-bugs at winehq.org
Mon Mar 17 00:30:51 CDT 2008
http://bugs.winehq.org/show_bug.cgi?id=12071
Summary: MSI SQL joins on tables with many rows are extremely
slow
Product: Wine
Version: 0.9.57.
Platform: PC
OS/Version: Linux
Status: NEW
Severity: enhancement
Priority: P2
Component: msi
AssignedTo: truiken at gmail.com
ReportedBy: truiken at gmail.com
CC: ead1234 at hotmail.com
With the current implementation of MSI SQL joins, we perform a Cartesian
product on the tables joined together. In the case where there is no WHERE
clause, this is the correct output, but if there is a WHERE clause, we still
perform the Cartesian product and then filter on the WHERE clause. Though it's
uncommon, there are a few installers (Visual Studio, Nero Express 7) that join
tables with thousands of rows. For example, say we have tables A, B, and C
s.t.
colA colB colC
----- ----- -----
1 1 1
2 2 2
... ... ...
1000 1000 1000
and we have the query
SELECT * FROM A, B, C WHERE `colA`=1 AND `colB`=1 AND `colC`=1
then the current implementation will create a new table with those columns
containing 1000*1000*1000=1 billion rows. Then we check each of the 1 billion
rows for any matches. There are several algorithms for optimizing joins:
http://en.wikipedia.org/wiki/Join_(SQL)#Join_algorithms
The solution I'll be working on is the merge join. This solution parses the
WHERE clause, starting with the two tables that the columns of the clause
belong to (colA -> A, colB -> B, etc) and joins those together, making sure to
eliminate rows based on the WHERE clause. Then the next table in the clause is
merged into the previously created table, eliminating rows. This continues
until there are no tables left. One optimization of this solution is to start
with parts of the WHERE clause that compare against literals. For example, the
query:
SELECT * FROM A, B, C WHERE `colA` = `colB` AND `colB` = 1 AND `colC` = 1
We'd start with "`colB` = 1" since the comparison is against a literal. We
perform 1000 comparisons on the rows of table B (count = 1000). The resulting
table is:
colB
----
1
then we do the same for table C (count = 1000), and the merged table is:
colB colC
---- ----
1 1
next we merge this table with table A:
colA colB colC
---- ---- ----
1 1 1
2 1 1
... ... ...
1000 1 1
and we search through this table for the condition `colA` = `colB` (count =
1000). So we've reduced billions of comparisons to 3000.
Unfortunately, the way our SQL engine is implemented, this will not be an easy
task.
--
Configure bugmail: http://bugs.winehq.org/userprefs.cgi?tab=email
Do not reply to this email, post in Bugzilla using the
above URL to reply.
------- You are receiving this mail because: -------
You are watching all bug changes.
More information about the wine-bugs
mailing list