I have a query that I run hourly and I am processing a certain dataset from this query. While processing this dataset, I need to ignore some IDs, I am currently doing this with NOT IN, but the number of IDs I need to ignore is around 50.
The question I am wondering is, I am creating a text file in a certain pattern with the data I am processing, should I use this ignore operation directly in the query or inside the foreach pattern for better performance?
Query returns around 5000-7000 data in a dataset consists of 10M record, and I need to ignore around 50 ID from resultset.
Lets say;
$blacklist_arr = array(1,10,20,30,40,50,60,70,80,90,100); //around 50 element in array~
What I use now;
...QUERY...
resultSet.ID NOT IN (\'' . implode( "', '" , $blacklist_arr ) . '\')
What I'm planning to use;
foreach ($final_dataset as $final_data) {
...
if (!in_array($final_data, $blacklist_arr )) {
//write to file
...
edit* Query structure is below;
SELECT *
FROM
(
(
SELECT DISTINCT a.col1, a.col2, a.col3, a.col4,..., a.coln
FROM
`a`
INNER JOIN ( SELECT MAX( b.col4 ) AS X, b.col2 FROM `a` AS `b` GROUP BY b.col2 ORDER BY NULL ) sub ON ( sub.X = a.col4 )
WHERE
( a.someColumn > NOW( ) - INTERVAL 2 HOUR )
AND ( a.col3 < DATE_HERE )
) UNION
(
SELECT a.col1, a.col2, a.col3, a.col4,..., a.coln
FROM
`a`
WHERE
( a.someColumn >= DATE_SUB( NOW( ), INTERVAL 3 MONTH ) AND a.col4 IS NULL )
AND ( a.col3 < DATE_HERE )
)
) AS resultSet
WHERE
resultSet.col1 NOT IN ( 1,10,20,30,40,50,60,70,80,90,100 )
ORDER BY
resultSet.col3 ASC,
resultSet.col2 ASC,
resultSet.col4 ASC,
resultSet.col1 DESC
NOT INis around 0.080 - 0.0100 seconds, right now it looks like it doesn't makes any difference but no of elements in array will increase daily/weekly. Expected no of elements in this array will be 300-500 each year and will be resetted 1 year intervalin_array()is not particularly effiecient.issetvsin_array. 100K / 10K: 0.03s vs 2.21s. 1M / 1K: 0.15s vs 2.10s. 1M / 10K: 0.15 vs 35.5s. 10M / 100K: 1.58s forisset. Theissetcheck adds < 5% to the loop baseline runtime so it's quite economical. Less so forin_array. Would be curious to see comparableNOT INstats for MySQL.