1

I have a quick question, I have the below df

df

    File_Path
0   /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_INVE_D.sh
1   /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_EMPF_D.sh
2   /data/app_next_best_action/call_nba_as11.sh
3   /data/application/AAIN/aain-srv-motor-extracao-next/iws/call_run_extract_default.sh cdlc_ing
4   sh /data/processos/current/aplicacao/AAVR/ACN10/scr/exec_fim_grupo.sh ACN10_ARQ_1   

and I want to get the 4th item of the tree structure in the File_Path column.

the output should looks like this:

df

    File_Path                                                                                                       Parent_path
0   /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_INVE_D.sh  /data/application/AANX/aanx-dataeng-slas-sysyphus/
1   /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_EMPF_D.sh  /data/application/AANX/aanx-dataeng-slas-sysyphus/
2   /data/app_next_best_action/call_nba_as11.sh                                                                     /data/app_next_best_action/call_nba_as11.sh
3   /data/application/AAIN/aain-srv-motor-extracao-next/iws/call_run_extract_default.sh                             /data/application/AAIN/aain-srv-motor-extracao-next/
4   sh /data/processos/current/aplicacao/AAVR/ACN10/scr/exec_fim_grupo.sh ACN10_ARQ_1                               /data/processos/current/aplicacao/

In index = 2, there is no 4th item, so it gets the last, which is a file call_nba_as11.sh

Also in index=4 there is a "sh " in the begining of the file_path value, I need to escape that

could guys help me?

1 Answer 1

3

You can use a regex with str.extract:

df['Parent_path'] = df['File_Path'].str.extract(r'^((?:/[^/]+){,4}/?)')

output:

                                                                                                        File_Path                                           Parent_path
0  /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_INVE_D.sh    /data/application/AANX/aanx-dataeng-slas-sysyphus/
1  /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_EMPF_D.sh    /data/application/AANX/aanx-dataeng-slas-sysyphus/
2                                                                     /data/app_next_best_action/call_nba_as11.sh           /data/app_next_best_action/call_nba_as11.sh
3                    /data/application/AAIN/aain-srv-motor-extracao-next/iws/call_run_extract_default.sh cdlc_ing  /data/application/AAIN/aain-srv-motor-extracao-next/

regex demo

Alternative:

df['Parent_path'] = df['File_Path'].str.extract(r'^[^/]*((?:/[^/]+){,4}/?)')

Output:

                                                                                                        File_Path                                           Parent_path
0  /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_INVE_D.sh    /data/application/AANX/aanx-dataeng-slas-sysyphus/
1  /data/application/AANX/aanx-dataeng-slas-sysyphus/scripts/s_shell/call_iws/call_PP_NEXT_RTBA_MAU_IND_EMPF_D.sh    /data/application/AANX/aanx-dataeng-slas-sysyphus/
2                                                                     /data/app_next_best_action/call_nba_as11.sh           /data/app_next_best_action/call_nba_as11.sh
3                    /data/application/AAIN/aain-srv-motor-extracao-next/iws/call_run_extract_default.sh cdlc_ing  /data/application/AAIN/aain-srv-motor-extracao-next/
4                               sh /data/processos/current/aplicacao/AAVR/ACN10/scr/exec_fim_grupo.sh ACN10_ARQ_1                    /data/processos/current/aplicacao/
Sign up to request clarification or add additional context in comments.

21 Comments

hey @mozway, thanks for this, that almost nailed it, but there are command where the sh from the shell is coming first, like : sh /data/processos/current/aplicacao/AAVR/ACN10/scr/exec_processo.sh ACN10_ARQ_1 PP_AAVRACN10_STAGE_C0001_03_D for these cases, the outuput is blank, how could I escape this sh in the beggining of the string?
Use r'^([^/]*(?:/[^/]+){,4}/?)' as regex
@AnoushiravanR to be able to repeat up to 4 times the '/' followed by non-/ characters without capturing those subpatterns
@AnoushiravanR, why don't you try and see? ;) (NB. don't assign to a column)
@AnoushiravanR but creating more columns is expensive, so not worth it if you're going to thrown them away immediately by slicing!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.