Show Menu

Arista CVP 2019.X Cheat Sheet (DRAFT) by

CloudVision Portal and Telemetry

This is a draft cheat sheet. It is a work in progress and is not finished yet.

CLI "­sho­w" Commands

cvpi status <co­mpo­nen­t>/all [-v=3]
-shows running, disabled, and failed compon­ents. It will show components that are failing. -v=3 adds verbos­ity.
cvpi resources [-v=3]
-shows memory, storage, disk throughput (>2­0MBps min for healthy disk, >40MBps recomm­ended), CPUs, and NTP sync (mandatory for multi-­node, at least ntpd UP for single­-node)
cvpi deps <co­mpo­nen­t> <st­art­/st­op>
gives the depend­encies for the component to be able to start/­stop
cvpi debug
-collects logs for all components for troubl­esh­ooting; collect on the primary node.
cvpi logs <co­mpo­nen­t>
-to find where logs are located for a particular component. i.e. 'cvpi logs aeris' shows you /cvp­i/a­pps­/ae­ris­/lo­gs/. Also good for finding which node a component and subsequent logs can be found. i.e. 'cvpi logs turbin­e-r­ate­-in­tf-­cou­nters' shows it resides on the tertiary node and its path.
cvpi info <co­mpo­nen­t>
-great command to learn about a component; includes actions that can be taken, ports used, config, logging, etc.
cvpi status all -v=3 | grep disabled
-to see which processes are disabled
-shows list of all commands run
cvpi version
-shows version of CVP
cvpi env or cat /etc/c­vpi­/env
-shows enviro­nmental variables and if they are correctly set
cvpi check all
-checks that everything is set up correctly; confirms nodes are talking to each other and have same config­s/e­nv/­etc.
dmesg -T
-shows kernel message buffer for checking disk/s­torage issues

CLI "­Con­fig­" Commands

cvpi start/stop <co­mpo­nen­t>/­all
-star­ts/­stops all availa­ble­/sp­ecified compon­ents
cvpi -v=3 start/stop <co­mpo­nen­t>/­all
-star­ts/­stops all availa­ble­/sp­ecified components with verbosity (detail regarding failures if subcom­ponents fail to start
cvpi start/stop cvpi
-star­ts/­stops cvpi stack
cvpi reset all
-resets the CVP app to its initial state via deleting all HBASE and Hadoop data
cvpi reset aeris
-deletes all Telemetry data; can be used for expedited upgrades from 2018.2.X to 2019.1.X
-case­-se­nsi­tive; in the event of an install failure, execute on primary node to set all 3 nodes back to default.
cvpi config <co­mpo­nen­t>/­all
-conf­igures the compon­ents
cvpi backup cvp
-new backup procedure in 2018.2.X and on
cvpi restore cvp cvp..tgz cvp.eo­sim­age­s..t­gz
-new restore procedure in 2018.2.X and on; can't restore across major releases due to data formatting changes (i.e. can't restore from 2018.X to 2019.X)
cvpi enable cvpi
-enables components of CVP to be automa­tically restarted if they stop
cvpi init
-gets rid of corrupted data folders; recreates directory structure; repairs any damage by removing whole direct­ories
hdfs dfsadmin -safemode get
-checks to see if hadoop­/hbase in safe mode
hdfs dfsadmin -safemode leave
-try to get primar­y/s­eco­ndary to leave safe mode; then try to start it again
hdfs hbck
-checks for incons­ist­enc­ies­/co­rru­ptions; prints OK or gives Errors; run several times as some incons­ist­encies are transient
hdfs hbck -repair
-repair incons­ist­encies; run 5-10 times if necessary
/cvp­i/z­ook­eep­er/­bin­/zk­Ser­ start/­stop
-if seeing zookeeper issues; zookeeper won't be stopped via 'cvpi stop all'
syst­emctl stop cvpi-w­atc­hdo­g.t­imer
In a cluster, will need to stop the watchdog timer when stopping zookeeper on all three nodes otherwise it will spawn a new zookeeper process.

MINIMUM Requir­ements

Lab (<25 devices)
Production (<=500 devices)
CPUs: 16 cores
CPUs: 16 cores
Disk: 125GB
Disk: 1 TB
Disk Throug­hput: 20MB/s
Disk Throug­hput: 40++MB/s
More might be needed based on feature sets in use. For example:

For CloudV­ision Wifi:
+4 CPU
+100GB Disk storage
+10 charisma

For Elasti­csearch (MAC/IP search feature):
+4 CPU

Also for Produc­tion, 16 Cores could be 8 CPU x 2 Core or 16 CPU x1 Core.

Where are the debug files?

Device­/In­terface Scale (multi­-node cluster)

As customers close in on these numbers, expect give and take with additional beta features, latency, etc. as resources reach capacity.

Where is it?

From root ==> su cvp ==> /cvpi
all scripts, packages, config files, logs
/cvpi­/logs; /cvpi/­hba­se/­logs; /cvpi/­had­oop­/logs; /cvpi/­tom­cat­/logs
Shortcut to logs
Also just run $ cvpi logs <co­mpo­nen­t> which shows path to logs.
Config Files
/cvpi­/co­nf/­com­pon­ents/; /cvpi/­app­s/t­urb­ine­/co­nfigs/; /cvpi/­app­s/a­eri­s/c­onf/; /cvpi/­app­s/c­vp/­conf/; /cvpi/­app­s/g­eig­er/­conf/; /cvpi/­app­s/w­ifi­man­age­r/conf
/da­ta/­cvp­bac­kup/ on the prim­ary; backups are run nightly at 2am UTC by default; check via crontab -l as root user; 5 backups stored

Minimum Config­uration on EOS Device

Confirm the daemon is correctly installed.
daemon TerminAttr
   exec /usr/bin/TerminAttr -ingestgrpcurl= -cvcompression=gzip -ingestauth=key,cvp -smashexcludes=ale,flexCounter,hardware,kni,pulse,strata -ingestexclude=/Sysdb/cell/1/agent,/Sysdb/cell/2/agent -ingestvrf=default -taillogs
   no shutdown
ntpd needs to be enabled for single node; NTP sync essential for multi-node.
ntp server prefer iburst
ntp server iburst
Turn up api for http for EAPI to work; turn up unix-socket so TerminAttr can talk to ConfigAgent (nginx method).
management api http-commands
   protocol http
   protocol unix-socket
   no shutdown
TerminAttr has 2 mechanisms to talk to Config­Agent:
Default VRF - via unix socket directly, no additional config required
Non-de­fault VRF - cannot talk directly (Confi­gAgent only listens in the Default VRF) so the connection has to go via nginx; protocol unix-s­ocket required under management api http-c­omm­ands.

Enabling LANZ on EOS CLI

queue-monitor length
queue-monitor streaming ⇒ TerminAttr runs in default VRF so this has to be in default as well!
no shutdown
Can confirm in bash via curl localh­ost­:60­60/­res­t/L­ANZ­/co­nge­stion