A few examples trying the AMDuProfCLI command. This seems to go through a “collect” stage followed by a “report” stage.
There are some pre-defined profiles that can be seen with the “AMDuProfCLI info –list collect-configs command
mev@sacramento:~$ AMDuProfCLI info --list collect-configs /opt/AMDuProf_4.0-341/bin/AMDuProfCLI List of predefined profiles that can be used with 'collect --config' option: tbp : Time-based Sampling Use this configuration to identify where programs are spending time. inst_access : Investigate Instruction Access Use this configuration to find instruction fetches with poor L1 instruction cache locality and poor ITLB behavior. [PMU Events: PMCx076, PMCx0C0, PMCx28F, PMCx18E, PMCx060, PMCx064, PMCx084, PMCx085, PMCx094] data_access : Investigate Data Access Use this configuration to find data access operations with poor L1 data cache locality and poor DTLB behavior. [PMU Events: PMCx076, PMCx0C0, PMCx029, PMCx060, PMCx043, PMCx047, PMCx045] assess_ext : Assess Performance (Extended) Use this configuration for an overall assessment of performance and to find the potential issues for further investigation. This has additional events to monitor than the Assess Performance configuration. [PMU Events: PMCx076, PMCx0C0, PMCx0C2, PMCx0C3, PMCx029, PMCx060, PMCx047, PMCx043, PMCx024, PMCx052, PMCx00E] memory : Cache Analysis Use this configuration to identify the false cache-line sharing issues. The profile data will be collected using IBS OP. branch : Investigate Branching Use this configuration to find poorly predicted branches and near returns. [PMU Events: PMCx076, PMCx0C0, PMCx0C2, PMCx0C3, PMCx0C4, PMCx0C5, PMCx0C8, PMCx0C9, PMCx0CA] assess : Assess Performance Use this configuration to get an overall assessment of performance and to find potential issues for further investigation. [PMU Events: PMCx076, PMCx0C0, PMCx0C2, PMCx0C3, PMCx029, PMCx060, PMCx043, PMCx047] ibs : Instruction-based Sampling Use this configuration to collect profile data using Instruction Based Sampling. Samples are attributed to instructions precisely with IBS. cpi : Investigate CPI Basic profile type to analyse the CPI and IPC metrics of the running application or the entire system. [PMU Events: PMCx076, PMCx0C0]
Picking the “assess” configuration we can next run this on stockfish. This needs to run as root to collect information. The “-o stockfish” option gives an output directory for the profile.
mev@sacramento:~$ /opt/AMDuProf_4.0-341/bin/AMDuProfCLI collect -o stockfish --config assess phoronix-test-suite batch-run stockfish
Next step is to create a report from the saved profile information. We point the report option at a saved file with the “-i option”
mev@sacramento:~$ sudo /opt/AMDuProf_4.0-341/bin/AMDuProfCLI report -i stockfish/AMDuProf-phoronix-test-suite-EBP_Jan-31-2023_17-31-15/ --report-output /home/mev/stockfish_out /opt/AMDuProf_4.0-341/bin/AMDuProfCLI Report generation started... Generating report file... Report generation completed... Generated report file: /home/mev/stockfish_out/report.csv
Unfortunately, the output report.csv file seems to tell me what is to be measured but didn’t have actual measurements
mev@sacramento:~$ more stockfish_out/report.csv "AMD uProf (Version:4.0.341.0)" PERFORMANCE ANALYSIS REPORT EXECUTION Target Path:,"phoronix-test-suite" Command Line Arguments:,"batch-run stockfish " Working Directory:,"/home/mev" Environment Variables: CPU Details:,"Family(0x19), Model(0x61), Number of Cores(32)" Operating System:,"LinuxUbuntu 22.04.1 LTS-64 Kernel:5.18.13-051813-generic" PROFILE DETAILS Profile Session Type:,"Assess Performance" Profile Scope:,"Single Application" CPU Mask:,"0-31" CPU Affinity Mask:,"0-31" Profile Start Time:,"Tue Jan 31 17:31:15 2023" Profile End Time:,"Tue Jan 31 17:35:53 2023" Profile Duration:,"277.888 seconds" Data Folder:,"/home/mev/stockfish/AMDuProf-phoronix-test-suite-EBP_Jan-31-2023_17-31-15" Virtual Machine:,"No" Call Stack Sampling:,"False" MONITORED EVENTS PMC Events:,Name,Interval,Unitmask,Countmask,Invert Countmask,User,OS,Description ,"CYCLES_NOT_IN_HALT (PMCx076)",250000,0x00,0x00,False,True,True,"The number of cpu cycles when the thread is not in halt state." ,"RETIRED_INST (PMCx0C0)",250000,0x00,0x00,False,True,True,"The number of instructions retired from execution. This count includes exceptions and interrupts. Each exception or interrupt is counted as one instruction." ,"RETIRED_BR_INST (PMCx0C2)",25000,0x00,0x00,False,True,True,"The number of branch instructions retired. This includes all types of architectural control flow changes, including exceptions and interrupts . " ,"RETIRED_BR_INST_MISP (PMCx0C3)",25000,0x00,0x00,False,True,True,"The number of retired branch instructions, that were mispredicted.Note that only EX direct mispredicts and indirect target mispredicts a re counted. " ,"MISALIGNED_LOADS (PMCx047)",25000,0x03,0x00,False,True,True,"The number of misaligned loads. This event counts the 64B (cacheline crossing) and 4K (page crossing) misaligned loads." ,"L1_DC_ACCESSES_ALL (PMCx029)",250000,0x07,0x00,False,True,True,"The number of load and store ops dispatched to LS unit. This counts the dispatch of single op that performs a memory load, dispatch of si ngle op that performs a memory store, dispatch of a single op that performs a load from and store to the same memory address." ,"L1_DEMAND_DC_REFILLS_LOCAL (PMCx043)",25000,0x0F,0x00,False,True,True,"The demand Data Cache fills from L2, L3, CCX and DRAM." ,"L2_CACHE_ACCESS_FROM_L1_DC_MISS (PMCx060)",25000,0xE8,0x00,False,True,True,"The L2 cache access requests due to L1 data cache misses. This also counts hardware and software prefetches"
Overall, might still be missing something but not finding this tool useful yet.