0

I have a bunch of text files in a directory and i need to read them and extract information and keep in an excel or text file

name1_1.txt

count: 10
totalcount: 30
percentage:33
total no of a's: 20
total no of b's: 20
etc...

name2_2.txt

count: 20
totalcount: 40
percentage:50
total no of a's: 10
total no of b's: 30
etc...

etc...

output

             name1        name2
 count        10           20
 totalcount   30           40
 percentage   33           50

I want the output to keep in file called(example.txt or .csv) in the same directory. can i get help in this?

here what i tried in writing a shell script,but can't create tab separated and output to file what i needed

 #$ -S /bin/bash


 for sample in *.txt; do
    header=$(echo ${sample} | awk '{sub(/_/," ")}1'| awk '{print $1}')
    echo -en $header"\t"
 done
 echo -e ' \t '
 echo "count"
 for sample in *.txt; do
    grep "count:" $sample | awk -F: $'\t''{print $2}'
 done
 echo "totalcount"
 for sample in *.txt; do
    grep "totalcount:" $sample | awk -F: $'\t''{print $2}'
 done
 echo "percentage"
 for sample in *.txt; do
    grep "percentage:" $sample | awk -F: $'\t''{print $2}'
 done

1 Answer 1

1

You can see if this does what you want:

awk -F":" 'BEGIN { DELIM="\t" } \
    last_filename != FILENAME { \
        split( FILENAME, farr, "_" ); header = header DELIM farr[1]; \
        last_filename = FILENAME; i=0 } \
    $1 ~ /count/ || $1 ~ /totalcount/ || $1 ~/percentage/ \
        { a[i++]= NR==FNR ? $1DELIM$2 : a[i]DELIM$2 } \
    END { print header; for( j in a ) { print a[j] } }' name*.txt

where I've tried to break it up into multiple lines for "easier" reading. You can just remove the trailing "\" from each line and concat each line to re-make it as a one-liner. If I edit this anwswer one more time, I'll just make it an executable awk file.

  1. The awk is setting a DELIM for the output to tab in the BEGIN block.
  2. The FILENAME is cleaned up and appended to the header
  3. It takes the column names from the first file, as well as the data and puts that into an array at i. For each next file, it just appends the data.
  4. At the END, the header is output, and then the contents of the array are output.

I get the following output then:

        name1   name2
count    10      20
totalcount       20      40
percentage      33      50

This will now only take the columns indicated in the data, provided $1 is an exact match for the count, totalcount and percentage.

Sign up to request clarification or add additional context in comments.

3 Comments

i have some other lines in the text file,which i don't want to take(like for example which i edited in the original files) @n0741337
Okay - I think I can deal with that too - one mo'
@abh - Add a > example.txt or > example.csv to the end of the awk command I posted ( your choice depending on how you want the file interpreted by other programs ). It's currently outputting to stdout. You can change the DELIM value if you want something other than tab.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.