Are there suitable indexes for this query?

Posted: edited May 22 at 7:54 - Source : stackoverflow

Are there any suitable indexes to support the following query?

SELECT DISTINCT p.id
FROM p
INNER JOIN l ON p.id = l.p1_id OR p.id = l.p2_id
WHERE p.s = 'Active'
AND (
    (l.s IN (1, 7) AND l.rd <= CURDATE())
    OR
    (l.s = 2 AND MONTH(l.td) = MONTH(CURDATE()) AND YEAR(l.td) = YEAR(CURDATE()))
) 

Tables:

CREATE TABLE p (
  id int(11) NOT NULL,
  s varchar(10) NOT NULL,
  PRIMARY KEY (id)
) ENGINE=InnoDB;

CREATE TABLE l (
  id int(11) NOT NULL,
  p1_id int(11) NOT NULL,
  p2_id int(11) NOT NULL,
  s int(11) NOT NULL,
  rd date NOT NULL,
  td date DEFAULT NULL,
  PRIMARY KEY (id),
  FOREIGN KEY (p1_id) REFERENCES p (id) ON UPDATE CASCADE,
  FOREIGN KEY (p2_id) REFERENCES p (id) ON UPDATE CASCADE
) ENGINE=InnoDB;

Explain:

+--+-----------+-----+----+-------------+---+-------+---+----+--------------------------------------------------+
|id|select_type|table|type|possible_keys|key|key_len|ref|rows|Extra                                             |
+--+-----------+-----+----+-------------+---+-------+---+----+--------------------------------------------------+
| 1|SIMPLE     |l    |ALL |             |   |       |   |3960|Using where; Using temporary                      |
| 1|SIMPLE     |p    |ALL |PRIMARY      |   |       |   |5091|Using where; Using join buffer (Block Nested Loop)|
+--+-----------+-----+----+-------------+---+-------+---+----+--------------------------------------------------+

I tried a number of single column and composite indexes based on the columns in the JOIN and WHERE clauses, and while indexes based on all the relevant columns are used by the DBMS, they don't make a difference to the number of rows evaluated.

Alternatively, could the query be rewritten in a more efficient way?

Edit:

Indexing on p.s provided some performance improvement, from 1.4 seconds down to 0.3 seconds.

ALTER TABLE p
ADD INDEX (s);

New explain:

+--+-----------+-----+----+-------------+---+-------+-----+----+--------------------------------------------------------+
|id|select_type|table|type|possible_keys|key|key_len|ref  |rows|Extra                                                   |
+--+-----------+-----+----+-------------+---+-------+-----+----+--------------------------------------------------------+
| 1|SIMPLE     |p    |ref |PRIMARY,s    |s  |32     |const|5058|Using where; Using index; Using temporary               |
| 1|SIMPLE     |l    |ALL |             |   |       |     |3960|Range checked for each record (index map: 0x6); Distinct|
+--+-----------+-----+----+-------------+---+-------+-----+----+--------------------------------------------------------+

Is further improvement possible?

Edit 2:

Explain of Rick James's UNION query with suggested indexes applied:

+--+------------+----------+-----+-------------+---+-------+-----+----+------------------------+
|id|select_type |table     |type |possible_keys|key|key_len|ref  |rows|Extra                   |
+--+------------+----------+-----+-------------+---+-------+-----+----+------------------------+
| 1|PRIMARY     |l         |range|srd,std      |srd|7      |     | 733|Using where; Using index|
| 2|UNION       |l         |range|srd,std      |std|7      |     |   2|Using where; Using index|
|  |UNION RESULT|<union1,2>|ALL  |             |   |       |     |    |Using temporary         |
+--+------------+----------+-----+-------------+---+-------+-----+----+------------------------+

Some stats:

SELECT s, COUNT(*) FROM l GROUP BY s
+-+--------+
|s|COUNT(*)|
+-+--------+
|1|     733|
|2|    3222|
|8|       5|
+-+--------+
      =3960

SELECT s, COUNT(*) FROM p GROUP BY s
+--------+--------+
|s       |COUNT(*)|
+--------+--------+
|Active  |    5059|
|Inactive|      32|
+--------+--------+
             =5091

The 8 in l.s is correct, and shouldn't be included in the results of the query above. Even though there are no rows with l.s=7, I need to include that possibility.

The expected result set contains 1144 records.

Finally:

Building on Rick James's advice, the following query, coupled with indexes in p on (s, td) and (r, td), performs as efficiently as I hoped to achieve (~50ms):

SELECT DISTINCT p.id
FROM (
    SELECT p1_id AS id
        FROM l
        WHERE s = 1 AND rd <= CURDATE()
    UNION  ALL 
    SELECT p2_id
        FROM l
        WHERE s = 7 AND rd <= CURDATE()
    UNION  ALL 
    SELECT p1_id AS id
        FROM l
        WHERE s = 1 AND rd <= CURDATE()
    UNION  ALL 
    SELECT p2_id
        FROM l
        WHERE s = 7 AND rd <= CURDATE()
    UNION  ALL 
    SELECT p1_id
        FROM l
        WHERE s = 2 AND td >= CONCAT(LEFT(CURDATE(), 7), '-01') AND td < CONCAT(LEFT(CURDATE(), 7), '-01') + INTERVAL 1 MONTH
    UNION ALL
    SELECT p2_id
        FROM l
        WHERE s = 2 AND td >= CONCAT(LEFT(CURDATE(), 7), '-01') AND td < CONCAT(LEFT(CURDATE(), 7), '-01') + INTERVAL 1 MONTH
) x
JOIN p ON p.id = x.id
WHERE p.s = 'Active'