2

I'm trying to write a script which lists a directory and creates an SQL script to insert these directories, problem is I only want to insert new directories, here is what I have so far:

#If file doesn't exist add the search path test
if [ ! -e  /home/aydin/movies.sql ] 
then
    echo "SET SEARCH_PATH TO noti_test;" >> /home/aydin/movies.sql;
fi
cd /media/htpc/
for i in *
do
    #for each directory escape any single quotes
    movie=$(echo $i | sed "s:':\\\':g" )
    #build sql insert string
    insertString="INSERT INTO movies (movie) VALUES (E'$movie');";
    #if sql string exists in file already   
    if grep -Fxq "$insertString" /home/aydin/movies.sql
    then
        #comment out string
        sed -i "s/$insertString/--$insertString/g" /home/aydin/movies.sql
    else
        #add sql string
            echo $insertString >> /home/aydin/movies.sql;
    fi
done;
#execute script
psql -U "aydin.hassan" -d "aydin_1.0" -f /home/aydin/movies.sql;

It seems to work apart from one thing, the script doesn't recognise entries with single quotes in them, so upon running the script again with no new dirs, this is what the file looks like:

--INSERT INTO movies (movie) VALUES (E'007, Moonraker (1979)');
--INSERT INTO movies (movie) VALUES (E'007, Octopussy (1983)');
INSERT INTO movies (movie) VALUES (E'007, On Her Majesty\'s Secret Service (1969)');  

I'm open to suggestions on a better way to do this also, my process seems pretty elongated and inefficient :)

2
  • Explicitly bash or just any shell? Default non-login shell in Debian is dash nowadays ... Commented Jun 29, 2012 at 22:49
  • Doesn't really matter, it could be python or anything similar for that matter! Commented Jun 29, 2012 at 22:51

3 Answers 3

1

Script looks generally good to me. Consider the revised version (untested):

#! /bin/bash
#If file doesn't exist add the search path test
if [ ! -e  /home/aydin/movies.sql ] 
then
    echo 'SET search_path=noti_test;' > /home/aydin/movies.sql;
fi
cd /media/htpc/
for i in *
do
    #build sql insert string - single quotes work fine inside dollar-quoting
    insertString="INSERT INTO movies (movie) SELECT \$x\$$movie\$x\$
WHERE NOT EXISTS (SELECT 1 FROM movies WHERE movie = \$x\$$movie\$x\$);"

    #no need for grep. SQL is self-contained.
    echo $insertString >> /home/aydin/movies.sql
done

#execute script
psql -U "aydin.hassan" -d "aydin_1.0" -f /home/aydin/movies.sql;
  • To start a new file, use > instead of >>

  • Use single quotes ' for string constants without variables to expand

  • Use PostgreSQL dollar-quoting so you don't have to worry about single-quotes in the strings. You'll have to escape the $ character in the shell to remove its special meaning in the shell.
    Use an "impossible" string for the dollar-quote, so it cannot appear in the string. If you don't have one, you can test for the quote-string and alter it in the unlikely case it should be matched, to be absolutely sure.

  • Use SELECT .. WHERE NOT EXISTS for the INSERT to automatically prevent already existing entries to be re-inserted. This prevents duplicate entries in the table completely - not just among the new entries.

  • An index on movies.movie (possibly, but not necessarily UNIQUE) would speed up the INSERTs.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for this, I will test today! There will be lots of other columns on the table including an index, this was just to test that what I wanted to do would work. But this does seems a lot more efficient than my way.
If the operation is getting big, I would compile one big COPY statement, import all names to a TEMP TABLE and INSERT to the target table from there. Or at least do it in one INSERT statement. Individual INSERTs are slow in comparison, which doesn't matter with just a handful of items. Like in query 2 or 3 of this answer.
0

Why bother with grep and sed and not just let the database detect duplicates?

Add a unique index on movie and create a new (temporary) insert script on each run and then execute it with autocommit (default) or with the -v ON_ERROR_ROLLBACK=1 option of psql. To get a full insert script of your movie database dump it with the --column-inserts option of pg_dump.

Hope this helps.

2 Comments

I'm not sure what you mean, can you provide an example? I can add an index, I need to add many columns this was just to provide a working prototype
@AydinHassan: ok, so please describe a little more what you are actually trying to do. What other columns/properties do you get from listing directories, last access time, size, number of files, etc? You would need update statements then. Or is the directory listing just an example and what you really need is a script with unique insert statements for some other domain? Or are the insert statements not relevant and you just want to have unique records in your table?
0

There's utility daemon called incron, which will fire your script whenever some file is written in watched directory. It uses kernel events, no loops - Linux only.

In its config (full file path):

/media/htpc IN_CLOSE_WRITE /home/aydin/added.sh $@/$#

Then simplest adder.sh script without any param check:

#!/bin/bash
cat <<-EOsql | psql -U "aydin.hassan" -d "aydin_1.0"
INSERT INTO movies (movie) VALUES (E'$1');
EOsql

You can have thousands of files in one directory and no issue as you can face with your original script.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.