UNIX help?

Feb. 13th, 2006 01:42 pm
quietann: (Default)
[personal profile] quietann
finally, a work-related question:

Is there an automated way to search subdirectories for .rtf and .pdf files, and gzip them if they are found? Most of the rtf files will be in subdirectories called "rtf", and the pdf files will be in subdirectories called "pdf".

(I've done too much of this the good old fashioned way, and it's making me stupid.)

Date: 2006-02-13 06:46 pm (UTC)
From: [identity profile] kathrynt.livejournal.com
Dammit! Yes, and fairly trivially, except that I can't remember the exact syntax and I don't have a *nix box here to check with.

Date: 2006-02-13 06:51 pm (UTC)
From: [identity profile] deguspice.livejournal.com
A quick google for "man exec gzip" turned up the example:
find . -name '*.ps' -exec gzip {} \;

Date: 2006-02-13 06:53 pm (UTC)
From: [identity profile] quietann.livejournal.com
can you explain this to me further?

Date: 2006-02-13 06:59 pm (UTC)
From: [identity profile] deguspice.livejournal.com
I found it at: http://mathlab.cit.cornell.edu/local_help/local_help.html#FIND

find walks through the subidrectories and every time it finds a file with a matching name, it calls gzip replacing the curly braces with the filename.

Date: 2006-02-13 07:02 pm (UTC)
From: [identity profile] quietann.livejournal.com
thank you! I knew it had something to do with find (which I use a lot). The whole idea that one can execute a subsequent command all on the same line is something I am still getting used to.

Date: 2006-02-13 07:17 pm (UTC)
From: [identity profile] feste-sylvain.livejournal.com
Sometimes you have to "escape" the wild-card in order for it to take:
find /document_head -name \*.ps [...]


And you absolutely have to escape that semicolon at the end, or the -exec section will refuse to believe that it has been terminated.

Date: 2006-02-13 07:33 pm (UTC)
From: [identity profile] marmota.livejournal.com
I'd actually go with:

find / -name "*.pdf" -print 2> /dev/null -ok gzip -f "{}" \;
"2> /dev/null" nukes the error output which I find makes it easier to read when not running as root.
"-ok" instead of "-exec" queries you y/n on each file so that you can be sure it's doing the right thing. (obviously, if it's for a lot of files, once it's doing the right thing, ^c out and run it with -exec)
And yeah, what fest_sylvain said about terminating it properly, although if you double-quote the search expression I don't think you need to worry about it there.

Date: 2006-02-13 07:50 pm (UTC)
From: [identity profile] feste-sylvain.livejournal.com
If there are any .pdf files in the directory you fire this command from, even the quoted "*.pdf" will be replaced with any local files which match that wild-carded expression. Not useful.

If there are no files which match that wild-carded expression, then it will be passed along literally.

FWIW

Date: 2006-02-13 08:03 pm (UTC)
From: [identity profile] kmhoofnagle.livejournal.com
If you want to make a lifestyle out of running grep (a text search) on nonbinaries and/or making scripts to do trivial things like this, you *dont* have to spend your time on *nix systems.

I have to work on Windoze boxes more than I want to, and there's a free set of tools to be found at cygwin.org that sets up a bash type shell on windows. I use it for everything.

Anyhow, if you want to work at this in a home environment, having a shell on a windows box could seriously help you get practiced.

k.

... oh and I'd help with the script, but...

Date: 2006-02-13 08:03 pm (UTC)
From: [identity profile] kmhoofnagle.livejournal.com
it looks like other have it well in hand.

Date: 2006-02-13 08:21 pm (UTC)
From: [identity profile] marmota.livejournal.com
hm. my example finds all .pdf from / on down, true. However, I figured quietann would craft it to suit her own needs along the lines of
find /whatever/rtf -name "*.rtf" -print 2> /dev/null -ok gzip -f "{}" \;

and since it queries y/n for each file, any bad hits can be dealt with on a case by case basis.

(I'm not sure what would match "*.pdf" that could be a problem. Now, "*.pdf*", THAT could be a problem.)
Anyway, sure, the \ wouldn't hurt.

Date: 2006-02-13 08:43 pm (UTC)
ext_106590: (Default)
From: [identity profile] frobzwiththingz.livejournal.com
you dont say what un*x variant you're one, but assuming you have GNU TAR available, the following one-liner will do it for you:

(find . -name \*.rtf ; find . -name \*.pdf) | tar cvfzT output.tgz -

Terse explanation of above:
The first two commands, in the parens, get executed in series, each one finding one type of files; putting in in parens gets them both executed by the same shell, thus the output ends up all together on stdout; tars "-T" option tells it to read a list of files to tar up from stdin.

Date: 2006-02-13 09:19 pm (UTC)
From: [identity profile] awfief.livejournal.com
I would add "-type f" there in the flags. Particularly since the directories might match in that case.

Date: 2006-02-13 09:36 pm (UTC)
From: [identity profile] marmota.livejournal.com
*wince* Any users I caught naming directories after file type suffixes complete with '.' would be in for a serious larting.

Date: 2006-02-13 09:45 pm (UTC)
From: [identity profile] awfief.livejournal.com
Well, true, but she already said that they were in directories marked pdf and rtf.

Date: 2006-02-13 10:29 pm (UTC)
From: [identity profile] quietann.livejournal.com
The guy who set up the directories was smarter than that :)
Page generated Jan. 28th, 2026 09:21 pm
Powered by Dreamwidth Studios