Show Menu
Cheatography

My first FB test cheat sheet

This is a draft cheat sheet. It is a work in progress and is not finished yet.

Flashblade

Blade commands

Commands
Descri­ption
pureblade list
To list the blades in FB
purehw list --spec
To show the blade serial number
hal-show -e /local | jq '.family, .id, .asy, .desc'
To show blade type and model
hal-eeprom -e /local­/ee­prom/id -r | jq .
To show blade serial number and Node ID
fbdiag wait-h­elper -v -n1.3
To verify Process status in blade
rpc.py blades­_av­ailable | jq . | egrep "­ir.*­80­01|­clu­ste­r_i­d"
To verify blade in cluster geometry
exec.py -n$cha­ssi­sNu­m.$­bla­deNum "­ir_­version | grep build"
To verify the Blade Purity / Build version
exec.py -nx -- sudo netstat -anot | grep <CL­IEN­T_I­P:P­ORT>
To verify any connec­tions to blade
sudo superv­isorctl status nfsd
verify NFS is running on the blade
fbdiag nfs-he­alt­h-check --mgmt-vip ch1-fb1 | grep -c "­running on server­"
To verify any authority running on the blade
nfs_co­ntr­ol.py -n$cha­ssi­sNu­m.$­bla­deNum stop -v
Stop NFS on the blade
nfs_co­ntr­ol.py -n$cha­ssi­sNu­m.$­bla­deNum start -v
Start NFS on the blade
purehw setattr --identify on CH$cha­ssi­sNu­m.F­B$b­ladeNum
Turn on Locator LED on blade
purehw setattr --identify off CH$cha­ssi­sNu­m.F­B$b­ladeNum
Turn off locator LED on blade
puregrep -E "­is_­eva­cuating changed to true|start evacua­tio­n" .fb/nfs.log
To check when Evac started on the blade <run from FUSE>
puregrep -E "­Calling blade removal RPC|blades observed geom change­" [af]m?­/pl­atf­orm.log
To check when Evac completed on the blade <run from FUSE>
date; fbupgrade power-ctl -v -nc.b cycle
Powercycle the blade
hal-show -e /local­/te­mpe­rature --headers name endpoint value units
How to check Blade temper­ature
lsblk
How to check Blade filesystem usage
sudo smartctl -i /dev/sda
How to check Blade SSD inform­ation
sudo dmidecode -t 17
How to check Blade DIMM inform­ation
date; exec.py -n1-2,­4-1­1,13,14 -- 'time=
date +%H:%M:
; zgrep "­$time.counter.S3.*al­loc­ate­d_a­us" /logs/­nfs.log' | awk '{a=a+­$12­;b=­b+$­13}­END­{print a,b}'
To check EVAC is Progre­ssing or not , Run twice at 10 second interval
 
How to evacuate the blade (there are few ways, here we are stopping the NFS )?
hal-slot -l
How to check Blade is powered ON or OFF in slot
fbdiag nfs-he­alt­h-check --mgmt-vip ch1-fb1 |grep 'booting for'
commands to check Authority booted
exec.py -na -- 'zgrep -a " [AEK] " /logs/­nfs.lo­g|tail'
Check for any AEK errors on Blades
exec.py -nx.x 'sudo rsync -auv ch1-fb­1:/­ssd­/nf­s_c­onf.json /ssd/n­fs_­con­f.json'
Copy tunable from one blade to another
zgrep "­fla­sh_­rea­d_u­nco­rre­ctable, all Vt retry options were unsucc­ess­ful­" nfs.log | perl -n -e '/(\[U­\d+\]).<(S­M=\d+ BNK=\d+ CE=\d+ LUN=\d+ BLK=\d­+)/­&& print $1 . " " . $2 . "­\n"' | sort | uniq | perl -n -e '/(\[U­\d+\]).(SM=\d+ BNK=\d+ CE=\d+ LUN=\d+) BLK=(­\d+)­/&­& print $1 . " " . $2 . " PLANE=­" . $3%4 . "­\n"' | sort | uniq -c
Checking for Bad blocks and bad planes
ir@ch1­-fb5:~$ /opt/i­r/d­evc­at_­loo­kup.py device­_he­alt­h|grep -A10 "­ske­tch­y_b­loc­k_a­ges­_se­c"
How to check Bad block rebuild progress , check for total count reached to "­0"
tgrep -a "­Blade nand type detect­ed" chfb/platf­orm.log* | uniq -f4
Getting blade type inform­ation from FUSE logs
fb dump hdiag --key puresm­b.s­tatus
To check SMB type configured from FUSE
fb info smb
To check SMB type configured from FUSE

SEV-1­\Per­for­mance issue helpful commands

purearray monitor
show the cluster's throughput
purearray monitor – protocol nfs --client
show all nfs client's stats
Purehw list
verify all blades are healthy , XFM, NW ports are healthy and no failures
exec.py -na -- 'zgrep -a " [AEK] " /logs/­nfs.lo­g|tail'
Check for any AEK errors across blades in NFS logs
Purearray list --space
check Array got sufficient space
purealert list
verify for any active Alerts on the blade
./fbhealth --fail
Run fbhealth live on system (Always push latest FBtools to array
Pureblade list
Check all blades are healthy
exec.py -na 'zgrep -a "Vt retry" /logs/­nfs.log
Check for any BAD blocks on the blade
exec.py -sa "­fbdiag monito­r-p­ort­stats --monitor err --once­"
Check for any port errors <Single chassi­s>
fbdiag monito­r-p­ort­stats --monitor err --once
Check for any port errors <Multi Chassi­s>
fbdiag monito­r-p­ort­stats --once
verify for Port serving IO's
exec.py -na 'zgrep -E " E | A | K " /logs/­pla­tfo­rm.l­og.*
Check for any AEK errors in Platform logs
bdiag nfs-he­alt­h-check --mgmt-vip ir2
check all Author­ities in case any Blade evacuation in-pro­gress or completed
exec.py -na 'zgrep "­Seg­men­tation fault" /logs/­nfs.lo­g-[­0-1]*'
Check for any SEGFAULTS in NFS logs (Need to understand what is SEGFAULT)
exec.py -na 'zgrep "­seg­fau­lt" /logs/­sys­tem.log'
Check for any SEGFAULTS in System logs
exec.py -na "­zgrep 'root:­:rp­c_s­erv­ice.tc­p_t­hro­ttl­e_a­vai­l_s­lots' /logs/­nfs.log | tail -n1"
verify for any RPC slots exhaustion
atopen "log filena­me"
To check the process utiliz­ation

Blade tools and commands

fbhealth
fbdiag
exec.py
fbupgrade
fbadmin
superv­isorctl
ir_version

HAL Commands

hal-slot
Used to list, discover hardware such as PSU, fan, QSFP, EFM and blades, is also used to start and stop blades

hal-slot -l
hal-slot --disc­ove­r-all
hal-slot -e /local­/sl­ot/­slo­t-b­lade-4 --start
hal-slot -e /local­/sl­ot/­slo­t-b­lade-4 --stop

hal-show
Used to show the rpc query result for one specified entry, this can be used on EFM to query the info on blade or other devices shown in the list of "­hal­-slot -l"

hal-show -e /local­/gpio
hal-show -e /local­/i2­c-bus
hal-show -e /local/i2c
hal-show -e /local­/sl­ot/­slo­t-b­lad­e-4­/mo­dul­e/gpio

hal-i2c
used to do raw i2c read/write from/to I2C device on given i2c bus, using "­hal­-show -e /local­/i2­c" to find out the I2C device address and the bus entry for target I2C device

hal-i2c -e /local­/i2­c-b­us/­bus­_entry --addr xx --offset xx --read xx(number of bytes)
hal-i2c -e /local­/i2­c-b­us/­bus­_entry --addr xx --offset xx --write xx xx xx

hal-eeprom
Used to read the eeprom contents

hal-eeprom -e /local­/ee­prom/id -r
hal-up­grade

FUSE commands

# Failng to RA or VATS PUSH/PULL , run below commands from FUSE
export PURELO­GIN­_KE­Y_T­YPE­=ed­25519

#How to push latest FBtools from FUSE to Customer array
fb auto fbtools

FB Network commands

purelag list
purene­twork list
puresubnet list
transc­iev­ers.py
switch­_sh­ell.py ps
purehw list --type eth
rpc.py -p switch get_po­rt_­stats | jq .
lldpcli show neigh
bdiag net-health
switch­_sh­ell.py vlan show | grep
switch­_sh­ell.py l3 multipath show
switch­_sh­ell.py vlan translate show
rpc.py -p switch get_xcver | jq .
Check transc­eiver is installed in the slot
SWITCH­STA­TS=­$(r­pc.py -p switch get_po­rt_­sta­ts);for x in {1..4}; do echo qsfp $x/1-4; for y in {1..4}; do echo $SWITC­HSTATS | jq '.["­qsf­p'$­x'/­'$y­'"]' | grep 'link_­status' ; done ;done
Check the Link status of individual QSFP's
lldpcli show neighbor | egrep 'qsfp|­ifname'
Identify ports on the TOR switches connecting to the FlashBlade
monito­r_p­ort­sta­ts.py
Check life statistics on all interfaces
exec.py -sa "­­fbdiag monito­­r-­p­o­rt­­stats --monitor err --once­­"
Monitor errors on ports <Muti chassi­s>
purehw connector list --cli
To show the Ethernet port details
tgrep -aE "link status­|LACP port|p­artner state" [af]m?­/pl­atform | egrep -i "­up|­dow­n"
from FUSE to check any port flappings
zgrep -v " INFO " [af]m?­/pl­atform | egrep -B1 -A40 " FCS " | less
from FUSE to check for any FCS errors

Upgrade related Doubts

Pre-up­grade health check failures
ECMP incons­istency issue
exec.py -xa -sa "­swi­tch­_sh­ell.py l3 egress show" ( check for the refcount is same across chassis , if there is +1 or -1 difference is accept­able)


Where is the Purity Images kept in FUSE?
All versions live on fuse in: /suppo­rt/­fla­shb­lad­e/r­eleases

What are the three stages of an NDU?
"­Fla­shBlade Upgrade Stage 1: Gather System Inform­ati­on"
"­Fla­shBlade Upgrade Stage 2: Upgrade Images­"
"­Fla­shBlade Upgrade Stage 3: Software Restart and Reboot­"

How to check the upgrade logs ?
Upgrade logs are recorded under /logs

* extend RA 8
then first go to master fm and type sudo tmux
this should start a tmux session.
once the tmux session is on run puresu­pport disconnect ; sleep 20 ; puresu­pport connect ; exit
this will disable the RA and then again re-enable for you.
less -r fbupgr­ade.lo­g.2­022­-03­-27.07­-17­-01.gz |tail

If you are discon­nected from tmux, how do you reconnect to it?
sudo tmux attach elasticity

How do you drop out of a tmux session?
Ctrl-b d

How to disconnect console session
CTRL+S­HIFT+e and c and .

Extend RA
then first go to master fm and type sudo tmux
this should start a tmux session.
once the tmux session is on run puresu­pport disconnect ; sleep 20 ; puresu­pport connect ; exit
this will disable the RA and then again re-enable for you.

How to start simple http server
The "­python -m Simple­HTT­PSe­rve­r" command is used to launch a basic HTTP server in the current working directory using Python 2.

When you run this command, Python will start a web server on port 8000 and serve the files in the current directory as static web pages.

Here's how to use it:

Open a terminal or command prompt in the directory where you want to serve files.
Run the command "­python -m Simple­HTT­PSe­rve­r" or "­python2 -m Simple­HTT­PSe­rve­r" (depending on your Python version).
The server will start and you'll see a message like "­Serving HTTP on 0.0.0.0 port 8000".
Open a web browser and navigate to "­htt­p:/­/lo­cal­hos­t:8­000­" to see the files being served.
Note that this command is intended for testing and develo­pment purposes only and should not be used in production enviro­nments. Also, if you're using Python 3, the command has been changed to "­python -m http.s­erv­er" or "­python3 -m http.s­erv­er".

FB array commands

 
 

Log file location

 

Files and locations