Infiniband Troubleshooting in Oracle Solutions

►  Problem identification & classification. (find clarity in chaos).

►  Divide & Conquer each problem: Where is the problem?

► Troubleshooting vs Identifying known issues.

-Identify known issues: Release notes & User docs.



Infiniband Layered Architecture – Troubleshoot based on layers


Troubleshooting case examples (1/2)

CASE-A: Infiniband Hardware Troubleshooting

»CASE-A1: Unable to access Switch/GW.

»CASE-A2: GW stays in ‘pre-boot’ environment.

»CASE-A3: FW upgrade failed on Switch/GW.

CASE-B: Infiniband Fabric Troubleshooting

»CASE-B1: How to trace link errors in IB fabric and fix it?

»CASE-B2: No MASTER SM seen in the IB fabric.

»CASE-B3: IB link is in INIT state.

CASE-C: Upper Layer Protocol Troubleshooting

»CASE-C1: IPoIB devices cannot communicate.

»CASE-C2: vNICs disappear on GW (‘bxm‘ service fails on GW).

»CASE-C3: vNIC on GW stays in WAIT-IOA state.

»CASE-C4: vNIC cannot talk to another vNIC on same GW.

»CASE-C5: vNIC cannot talk to a NIC through GW.

»CASE-C6: Failed to set MAC when creating vNIC.


Troubleshooting case examples (2/2)

Troubleshooting infiniband








Leave a Reply

Your email address will not be published. Required fields are marked *