A few examples trying the AMDuProfCLI command. This seems to go through a “collect” stage followed by a “report” stage.
There are some pre-defined profiles that can be seen with the “AMDuProfCLI info –list collect-configs command
mev@sacramento:~$ AMDuProfCLI info --list collect-configs
/opt/AMDuProf_4.0-341/bin/AMDuProfCLI
List of predefined profiles that can be used with 'collect --config' option:
tbp : Time-based Sampling
Use this configuration to identify where programs are spending time.
inst_access : Investigate Instruction Access
Use this configuration to find instruction fetches with poor L1 instruction
cache locality and poor ITLB behavior.
[PMU Events: PMCx076, PMCx0C0, PMCx28F, PMCx18E, PMCx060, PMCx064, PMCx084, PMCx085,
PMCx094]
data_access : Investigate Data Access
Use this configuration to find data access operations with poor L1 data
cache locality and poor DTLB behavior.
[PMU Events: PMCx076, PMCx0C0, PMCx029, PMCx060, PMCx043, PMCx047, PMCx045]
assess_ext : Assess Performance (Extended)
Use this configuration for an overall assessment of performance and to
find the potential issues for further investigation. This has additional
events to monitor than the Assess Performance configuration.
[PMU Events: PMCx076, PMCx0C0, PMCx0C2, PMCx0C3, PMCx029, PMCx060, PMCx047, PMCx043,
PMCx024, PMCx052, PMCx00E]
memory : Cache Analysis
Use this configuration to identify the false cache-line sharing issues.
The profile data will be collected using IBS OP.
branch : Investigate Branching
Use this configuration to find poorly predicted branches and near returns.
[PMU Events: PMCx076, PMCx0C0, PMCx0C2, PMCx0C3, PMCx0C4, PMCx0C5, PMCx0C8, PMCx0C9,
PMCx0CA]
assess : Assess Performance
Use this configuration to get an overall assessment of performance and
to find potential issues for further investigation.
[PMU Events: PMCx076, PMCx0C0, PMCx0C2, PMCx0C3, PMCx029, PMCx060, PMCx043, PMCx047]
ibs : Instruction-based Sampling
Use this configuration to collect profile data using Instruction Based
Sampling. Samples are attributed to instructions precisely with IBS.
cpi : Investigate CPI
Basic profile type to analyse the CPI and IPC metrics of the running application
or the entire system.
[PMU Events: PMCx076, PMCx0C0]
Picking the “assess” configuration we can next run this on stockfish. This needs to run as root to collect information. The “-o stockfish” option gives an output directory for the profile.
mev@sacramento:~$ /opt/AMDuProf_4.0-341/bin/AMDuProfCLI collect -o stockfish --config assess phoronix-test-suite batch-run stockfish
Next step is to create a report from the saved profile information. We point the report option at a saved file with the “-i option”
mev@sacramento:~$ sudo /opt/AMDuProf_4.0-341/bin/AMDuProfCLI report -i stockfish/AMDuProf-phoronix-test-suite-EBP_Jan-31-2023_17-31-15/ --report-output /home/mev/stockfish_out
/opt/AMDuProf_4.0-341/bin/AMDuProfCLI
Report generation started...
Generating report file...
Report generation completed...
Generated report file: /home/mev/stockfish_out/report.csv
Unfortunately, the output report.csv file seems to tell me what is to be measured but didn’t have actual measurements
mev@sacramento:~$ more stockfish_out/report.csv
"AMD uProf (Version:4.0.341.0)"
PERFORMANCE ANALYSIS REPORT
EXECUTION
Target Path:,"phoronix-test-suite"
Command Line Arguments:,"batch-run stockfish "
Working Directory:,"/home/mev"
Environment Variables:
CPU Details:,"Family(0x19), Model(0x61), Number of Cores(32)"
Operating System:,"LinuxUbuntu 22.04.1 LTS-64 Kernel:5.18.13-051813-generic"
PROFILE DETAILS
Profile Session Type:,"Assess Performance"
Profile Scope:,"Single Application"
CPU Mask:,"0-31"
CPU Affinity Mask:,"0-31"
Profile Start Time:,"Tue Jan 31 17:31:15 2023"
Profile End Time:,"Tue Jan 31 17:35:53 2023"
Profile Duration:,"277.888 seconds"
Data Folder:,"/home/mev/stockfish/AMDuProf-phoronix-test-suite-EBP_Jan-31-2023_17-31-15"
Virtual Machine:,"No"
Call Stack Sampling:,"False"
MONITORED EVENTS
PMC Events:,Name,Interval,Unitmask,Countmask,Invert Countmask,User,OS,Description
,"CYCLES_NOT_IN_HALT (PMCx076)",250000,0x00,0x00,False,True,True,"The number of cpu cycles when the thread is not in halt state."
,"RETIRED_INST (PMCx0C0)",250000,0x00,0x00,False,True,True,"The number of instructions retired from execution. This count includes exceptions and interrupts. Each exception or interrupt is counted as one
instruction."
,"RETIRED_BR_INST (PMCx0C2)",25000,0x00,0x00,False,True,True,"The number of branch instructions retired. This includes all types of architectural control flow changes, including exceptions and interrupts
.
"
,"RETIRED_BR_INST_MISP (PMCx0C3)",25000,0x00,0x00,False,True,True,"The number of retired branch instructions, that were mispredicted.Note that only EX direct mispredicts and indirect target mispredicts a
re counted.
"
,"MISALIGNED_LOADS (PMCx047)",25000,0x03,0x00,False,True,True,"The number of misaligned loads. This event counts the 64B (cacheline crossing) and 4K (page crossing) misaligned loads."
,"L1_DC_ACCESSES_ALL (PMCx029)",250000,0x07,0x00,False,True,True,"The number of load and store ops dispatched to LS unit. This counts the dispatch of single op that performs a memory load, dispatch of si
ngle op that performs a memory store, dispatch of a single op that performs a load from and store to the same memory address."
,"L1_DEMAND_DC_REFILLS_LOCAL (PMCx043)",25000,0x0F,0x00,False,True,True,"The demand Data Cache fills from L2, L3, CCX and DRAM."
,"L2_CACHE_ACCESS_FROM_L1_DC_MISS (PMCx060)",25000,0xE8,0x00,False,True,True,"The L2 cache access requests due to L1 data cache misses. This also counts hardware and software prefetches"
Overall, might still be missing something but not finding this tool useful yet.