bigQM7w

Logo

A high-quality dataset of ground-state properties and excited state spectra of 12880 molecules containing up to 7 atoms of CONF


moldis-group

bigQM7ω dataset

This dataset Ref-1 contains structures, ground state properties and electronic spectra calculated with the range-separated hybrid DFT method, ωB97XD. All results are provided for 3 basis sets: 3-21G, def2SVP, def2TZVP. Results from baseline models PM6 and ZINDO are also provided.

Geometries at ωB97XD/def2SVP and ωB97XD/def2TZVP levels retain connectivities as encoded in the original SMILES.


SMILES, Geometries and Frequencies

3 uncharacterized molecules (indices: 7705, 7714, 7715) containing -N=N-O- substructure in a ring are eliminated.

NOTE: We retain molecular indices as in GDB11 so the indices run from 1 to 12883.

SMILES from GDB11 universe for 12880 molecules, SELFIES and UFF geometries

bigQM7w.smi.bz2 (52 kB)
bigQM7w.selfies.bz2 (32 kB)
bigQM7w_UFF.sdf.bz2 (1.5 MB)
bigQM7w_UFF.xyz.bz2 (1.4 MB)

Minimum energy geometries of 12880 molecules

bigQM7w_PM6.xyz.bz2 (2.0 MB)
bigQM7w_wB97XD_321G.xyz.bz2 (1.9 MB)
bigQM7w_wB97XD_def2SVP.xyz.bz2 (2.0 MB)
bigQM7w_wB97XD_def2TZVP.xyz.bz2 (2.0 MB)

Harmonic frequencies of 12880 molecules

bigQM7w_wB97XD_321G_freq.txt.bz2 (1.7 MB)
bigQM7w_wB97XD_def2SVP_freq.txt.bz2 (1.7 MB)
bigQM7w_wB97XD_def2TZVP_freq.txt.bz2 (1.7 MB)

NOTE: The purpose of PM6 and ωB97XD/3-21G geometries was to generate rapid baseline geometries for structural descriptors in ML. So, these calculations have not been performed by computing force constants only at the first step of geometry optimization, i.e., with Opt(CalcFC). Three molecules (indices: 8815, 8816, 10619) at the ωB97XD/def2TZVP level contain small imaginary wavenumbers with magnitude < 10cm-1

Unzip the files in linux as

bunzip2 -f bigQM7w.smi.bz2
bunzip2 -f bigQM7w.selfies.bz2
bunzip2 -f bigQM7w_UFF.xyz.bz2
bunzip2 -f bigQM7w_UFF.sdf.bz2
bunzip2 -f bigQM7w_PM6.xyz.bz2
bunzip2 -f bigQM7w_wB97XD_321G.xyz.bz2
bunzip2 -f bigQM7w_wB97XD_def2SVP.xyz.bz2
bunzip2 -f bigQM7w_wB97XD_def2TZVP.xyz.bz2
bunzip2 -f bigQM7w_wB97XD_321G_freq.txt.bz2
bunzip2 -f bigQM7w_wB97XD_def2SVP_freq.txt.bz2
bunzip2 -f bigQM7w_wB97XD_def2TZVP_freq.txt.bz2

Ground state properties

PM6-level properties

Column 01 Molecule index
Column 02 HOMO energy (in au)
Column 03 LUMO energy (in au)
Column 04 HOMO-LUMO gap (in au)
Column 05 Total energy (in au)
Column 06 Atomization energy (in au)

bigQM7w_PM6_prop.txt.bz2 (282 kB)

ωB97XD-level properties

Column 01 Molecule index
Column 02 Dipole moment (in debye)
Column 03 Polarizability (in a03, bohr3)
Column 04 HOMO energy (in EH, hartree)
Column 05 LUMO energy (in EH, hartree)
Column 06 HOMO-LUMO gap (in EH, hartree)
Column 07 Radial expectation value, <R2> (in a02, bohr2)
Column 08 Zero-point vibrational energy (in kcal/mol)
Column 09 Sum of electronic and zero-point energies, U0 (in EH, hartree)
Column 10 Sum of electronic and thermal energies, UT at 298.15 K (in EH, hartree)
Column 11 Sum of electronic and thermal enthalpies, HT at 298.15 K (in EH, hartree)
Column 12 Sum of electronic and thermal free energies, GT at 298.15 K (in EH, hartree)
Column 13 Total heat capacity, Cv (in Cal/mol/K)
Column 14 Atomization energy (in EH, hartree)

bigQM7w_wB97XD_321G_prop.txt.bz2 (619 kB)
bigQM7w_wB97XD_def2SVP_prop.txt.bz2 (619 kB)
bigQM7w_wB97XD_def2TZVP_prop.txt.bz2 (617 kB)

Unzip the files in linux as

bunzip2 -f bigQM7w_PM6_prop.txt.bz2            
bunzip2 -f bigQM7w_wB97XD_321G_prop.txt.bz2               
bunzip2 -f bigQM7w_wB97XD_def2SVP_prop.txt.bz2                  
bunzip2 -f bigQM7w_wB97XD_def2TZVP_prop.txt.bz2                         

Mulliken charges

bigQM7w_wB97XD_321G_Mulliken.txt.bz2
bigQM7w_wB97XD_def2SVP_Mulliken.txt.bz2
bigQM7w_wB97XD_def2TZVP_Mulliken.txt.bz2

APT charges

bigQM7w_wB97XD_321G_APT.txt.bz2
bigQM7w_wB97XD_def2SVP_APT.txt.bz2
bigQM7w_wB97XD_def2TZVP_APT.txt.bz2


Excited state properties at ZINDO and TD-ωB97XD levels

Column 01 Excitation index, k, with respect to ground state
Column 02 Excitation energy, E (S0 → Sk) (in eV)
Column 03 Excitation wavelength, λ (S0 → Sk) (in nm)
Column 04 Oscillator strength of f (S0 → Sk) excitation
Column 05 Tx (in au), x-component of transition dipole moment vector for S0 → Sk excitation
Column 06 Ty (in au), y-component of transition dipole moment vector for S0 → Sk excitation
Column 07 Tz (in au), z-component of transition dipole moment vector for S0 → Sk excitation
Column 08 T2 (in au2), square of transition dipole moment for S0 → Sk excitation

bigQM7w_ZINDO_spectra.tar.gz (76 MB)
bigQM7w_TDwB97XD_321G_spectra.tar.gz (207 MB)
bigQM7w_TDwB97XD_def2SVP_spectra.tar.gz (409 MB)
bigQM7w_TDwB97XD_def2TZVP_spectra.tar.gz (791 MB)

Untar the folders in linux as

tar -xzf bigQM7w_ZINDO_spectra.tar.gz     
tar -xzf bigQM7w_TDwB97XD_321G_spectra.tar.gz   
tar -xzf bigQM7w_TDwB97XD_def2SVP_spectra.tar.gz   
tar -xzf bigQM7w_TDwB97XD_def2TZVP_spectra.tar.gz 

Each folder contains 12880 zipped files. Unzip these, for example, as follows

cd bigQM7w_ZINDO_spectra
bunzip2 -f bigQM7w_ZINDO_000001.dat.bz2
...
bunzip2 -f bigQM7w_ZINDO_012880.dat.bz2
cd bigQM7w_TDwB97XD_def2TZVP_spectra
bunzip2 -f bigQM7w_TDwB97XD_def2TZVP_000001.dat.bz2
...
bunzip2 -f bigQM7w_TDwB97XD_def2TZVP_012880.dat.bz2

Data-mining platform

You can access a data-mining platform to query this dataset at https://moldis.tifrh.res.in/datasets.html.


Raw input/output files on NOMAD

https://dx.doi.org/10.17172/NOMAD/2021.09.30-1


Machine learning model for electronic spectra

To learn about the machine learning model for reconstruction of TDωB97XD/def2SVPD@ωB97XD/def2SVP-level electronic spectra, please see the material collected at https://github.com/moldis-group/bigQM7w/tree/main/ML_spectrum


Revision notes

15 September 2021: First upload
27 October 2021: Mulliken and APT charges uploaded
30 March 2022: Data-mining platform at MolDis announced
03 April 2022: ML model for full-spectrum reconstruction made available


References

[Ref-1] The Resolution-vs.-Accuracy Dilemma in Machine Learning Modeling of Electronic Excitation Spectra
Prakriti Kayastha, Sabyasachi Chakraborty, Raghunathan Ramakrishnan
Digital Discovery, 1 (2022) 689-702.
DOI: https://doi.org/10.1039/D1DD00031D