Large databases of crystalline materials and associated properties, typically computed using density functional theory (DFT), have become widely available and used for training machine learning (ML) models. Thermodynamic stability, band gap, and elastic mechanical properties are now available in large volumes with sufficient accuracy to train effective models. Magnetic properties, however, require special attention to model using DFT and are only currently available in databases that are orders of magnitude smaller in size. ML models could therefore be used to screen candidate materials for magnetic properties if their effectiveness can be demonstrated in lower data regimes. In this work, we compare the ability of multiple model architectures to predict two types of DFT-computed magnetic properties: the saturation polarization as a measure of magnetic strength, and magnetocrystalline anisotropy as a measure of magnetic hardness. Surprisingly, we observed a drastic difference: ML models can learn to predict saturation polarization much more easily than the anisotropy, across multiple types of architectures. We also found, surprisingly, that saturation polarization can be predicted equally well by a deep learning ML architecture that incorporates crystal structure (CGCNN) compared to one that incorporates composition only (RooSt). Perhaps most exciting, the model predictions correlate reasonably well with experimental measurements of samples we fabricated using arc melting. This suggests composition-based models may enable screening of novel candidate material spaces without prior knowledge of crystal structure, which would drastically accelerate predictions of materials with high saturation polarization.