How to model my data when having subgropus of data with vastly different volumes