26

I can run this normally on the command line in Linux:

$ tar c my_dir | md5sum

But when I try to call it with Python I get an error:

>>> subprocess.Popen(['tar','-c','my_dir','|','md5sum'],shell=True)
<subprocess.Popen object at 0x26c0550>
>>> tar: You must specify one of the `-Acdtrux' or `--test-label'  options
Try `tar --help' or `tar --usage' for more information.
3
  • 1
    Why are you hashing a tar file? Do you mean to be looking for changes in file contents? or verify an externally created tar file? Commented Sep 6, 2011 at 18:27
  • Perhaps see also stackoverflow.com/questions/24306205/… Commented Feb 7, 2021 at 9:22
  • @tMC: and how does this comment help with the actual problem and question ??? Commented Jan 11, 2023 at 11:30

5 Answers 5

25

You have to use subprocess.PIPE, also, to split the command, you should use shlex.split() to prevent strange behaviours in some cases:

from subprocess import Popen, PIPE
from shlex import split
p1 = Popen(split("tar -c mydir"), stdout=PIPE)
p2 = Popen(split("md5sum"), stdin=p1.stdout)

But to make an archive and generate its checksum, you should use Python built-in modules tarfile and hashlib instead of calling shell commands.

Sign up to request clarification or add additional context in comments.

5 Comments

tarfile, and hashlib would be preferable. But how do I hash a tarfile object?
@Greg don't hash the tarfile object, open the resulting file like any other file using open() and then hash its content.
Makes sense. That works but I get a different hash value than from the original command. Is that to be expected?
@Greg, this should do the same exact thing as tar -c mydir | md5sum. Perhaps you could start a new question, including an interactive terminal session where you run this command, start Python, and run the Python commands, displaying the output.
Perhaps also mention that you have to call communicate on the final Popen object, or switch to a modern wrapper like subprocess.run. For many cases, simply pass in a string with shell=True if you want to use shell features like pipes, variables, redirection, job control, etc. Or as the answer suggests, run as little as possible in a subprocess and replace shell commands with native Python where you can (in which case you can often avoid the security implications of shell=True by removing it).
11

Ok, I'm not sure why but this seems to work:

subprocess.call("tar c my_dir | md5sum",shell=True)

Anyone know why the original code doesn't work?

2 Comments

the pipe | is a character the shell understands to connect command inputs and outputs together. It is not an argument that tar understands, nor a command. You're trying to execute everything as arguments to the tar command, unless you create a subshell.
The works because the entire command is passed to the shell and the shell understands the |. Popen calls the process and passes in the arguments directly. For Popen this is controlled with shell= and passing a string (not a list), IIRC.
5

What you actually want is to run a shell subprocess with the shell command as a parameter:

>>> subprocess.Popen(['sh', '-c', 'echo hi | md5sum'], stdout=subprocess.PIPE).communicate()
('764efa883dda1e11db47671c4a3bbd9e  -\n', None)

2 Comments

Incidentally, shell=True does something similar.
It's just silly to not use shell=True here.
4

i would try your on python v3.8.10 :

import subprocess
proc1 = subprocess.run(['tar c my_dir'], stdout=subprocess.PIPE, shell=True)
proc2 = subprocess.run(['md5sum'], input=proc1.stdout, stdout=subprocess.PIPE, shell=True)
print(proc2.stdout.decode())

key points (like outline in my solution on related https://stackoverflow.com/a/68323133/12361522):

  • subprocess.run()
  • no splits of bash command and parameters, i.e. ['tar c my_dir']or ["tar c my_dir"]
  • stdout=subprocess.PIPE for all processes
  • input=proc1.stdout chain of output of previous one into input of the next one
  • enable shell shell=True

3 Comments

This is basically just a restatement of the accepted answer. The use of run over Popen is a good idea when you can, of course (back when the accepted answer was written, run didn't exist).
thanks for posting this example, i needed to have grep in my command string, which did weird stuff when being supplied to split.
I would prefer this over the accepted answer due to using run instead of shlex
1
>>> from subprocess import Popen,PIPE
>>> import hashlib
>>> proc = Popen(['tar','-c','/etc/hosts'], stdout=PIPE)
>>> stdout, stderr = proc.communicate()
>>> hashlib.md5(stdout).hexdigest()
'a13061c76e2c9366282412f455460889'
>>> 

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.