A high-quality dataset of ground-state properties and excited state spectra of 12880 molecules containing up to 7 atoms of CONF
This dataset Ref-1 contains structures, ground state properties and electronic spectra calculated with the range-separated hybrid DFT method, ωB97XD. All results are provided for 3 basis sets: 3-21G, def2SVP, def2TZVP. Results from baseline models PM6 and ZINDO are also provided.
Geometries at ωB97XD/def2SVP and ωB97XD/def2TZVP levels retain connectivities as encoded in the original SMILES.
3 uncharacterized molecules (indices: 7705, 7714, 7715) containing -N=N-O- substructure in a ring are eliminated.
NOTE: We retain molecular indices as in GDB11 so the indices run from 1 to 12883.
bigQM7w.smi.bz2 (52 kB)
bigQM7w.selfies.bz2 (32 kB)
bigQM7w_UFF.sdf.bz2 (1.5 MB)
bigQM7w_UFF.xyz.bz2 (1.4 MB)
bigQM7w_PM6.xyz.bz2 (2.0 MB)
bigQM7w_wB97XD_321G.xyz.bz2 (1.9 MB)
bigQM7w_wB97XD_def2SVP.xyz.bz2 (2.0 MB)
bigQM7w_wB97XD_def2TZVP.xyz.bz2 (2.0 MB)
bigQM7w_wB97XD_321G_freq.txt.bz2 (1.7 MB)
bigQM7w_wB97XD_def2SVP_freq.txt.bz2 (1.7 MB)
bigQM7w_wB97XD_def2TZVP_freq.txt.bz2 (1.7 MB)
NOTE: The purpose of PM6 and ωB97XD/3-21G geometries was to generate rapid baseline geometries for structural descriptors in ML. So, these calculations have not been performed by computing force constants only at the first step of geometry optimization, i.e., with Opt(CalcFC). Three molecules (indices: 8815, 8816, 10619) at the ωB97XD/def2TZVP level contain small imaginary wavenumbers with magnitude < 10cm-1
Unzip the files in linux as
bunzip2 -f bigQM7w.smi.bz2
bunzip2 -f bigQM7w.selfies.bz2
bunzip2 -f bigQM7w_UFF.xyz.bz2
bunzip2 -f bigQM7w_UFF.sdf.bz2
bunzip2 -f bigQM7w_PM6.xyz.bz2
bunzip2 -f bigQM7w_wB97XD_321G.xyz.bz2
bunzip2 -f bigQM7w_wB97XD_def2SVP.xyz.bz2
bunzip2 -f bigQM7w_wB97XD_def2TZVP.xyz.bz2
bunzip2 -f bigQM7w_wB97XD_321G_freq.txt.bz2
bunzip2 -f bigQM7w_wB97XD_def2SVP_freq.txt.bz2
bunzip2 -f bigQM7w_wB97XD_def2TZVP_freq.txt.bz2
Column 01 Molecule index
Column 02 HOMO energy (in au)
Column 03 LUMO energy (in au)
Column 04 HOMO-LUMO gap (in au)
Column 05 Total energy (in au)
Column 06 Atomization energy (in au)
bigQM7w_PM6_prop.txt.bz2 (282 kB)
Column 01 Molecule index
Column 02 Dipole moment (in debye)
Column 03 Polarizability (in a03, bohr3)
Column 04 HOMO energy (in EH, hartree)
Column 05 LUMO energy (in EH, hartree)
Column 06 HOMO-LUMO gap (in EH, hartree)
Column 07 Radial expectation value, <R2> (in a02, bohr2)
Column 08 Zero-point vibrational energy (in kcal/mol)
Column 09 Sum of electronic and zero-point energies, U0 (in EH, hartree)
Column 10 Sum of electronic and thermal energies, UT at 298.15 K (in EH, hartree)
Column 11 Sum of electronic and thermal enthalpies, HT at 298.15 K (in EH, hartree)
Column 12 Sum of electronic and thermal free energies, GT at 298.15 K (in EH, hartree)
Column 13 Total heat capacity, Cv (in Cal/mol/K)
Column 14 Atomization energy (in EH, hartree)
bigQM7w_wB97XD_321G_prop.txt.bz2 (619 kB)
bigQM7w_wB97XD_def2SVP_prop.txt.bz2 (619 kB)
bigQM7w_wB97XD_def2TZVP_prop.txt.bz2 (617 kB)
Unzip the files in linux as
bunzip2 -f bigQM7w_PM6_prop.txt.bz2
bunzip2 -f bigQM7w_wB97XD_321G_prop.txt.bz2
bunzip2 -f bigQM7w_wB97XD_def2SVP_prop.txt.bz2
bunzip2 -f bigQM7w_wB97XD_def2TZVP_prop.txt.bz2
bigQM7w_wB97XD_321G_Mulliken.txt.bz2
bigQM7w_wB97XD_def2SVP_Mulliken.txt.bz2
bigQM7w_wB97XD_def2TZVP_Mulliken.txt.bz2
bigQM7w_wB97XD_321G_APT.txt.bz2
bigQM7w_wB97XD_def2SVP_APT.txt.bz2
bigQM7w_wB97XD_def2TZVP_APT.txt.bz2
Column 01 Excitation index, k, with respect to ground state
Column 02 Excitation energy, E (S0 → Sk) (in eV)
Column 03 Excitation wavelength, λ (S0 → Sk) (in nm)
Column 04 Oscillator strength of f (S0 → Sk) excitation
Column 05 Tx (in au), x-component of transition dipole moment vector for S0 → Sk excitation
Column 06 Ty (in au), y-component of transition dipole moment vector for S0 → Sk excitation
Column 07 Tz (in au), z-component of transition dipole moment vector for S0 → Sk excitation
Column 08 T2 (in au2), square of transition dipole moment for S0 → Sk excitation
bigQM7w_ZINDO_spectra.tar.gz (76 MB)
bigQM7w_TDwB97XD_321G_spectra.tar.gz (207 MB)
bigQM7w_TDwB97XD_def2SVP_spectra.tar.gz (409 MB)
bigQM7w_TDwB97XD_def2TZVP_spectra.tar.gz (791 MB)
Untar the folders in linux as
tar -xzf bigQM7w_ZINDO_spectra.tar.gz
tar -xzf bigQM7w_TDwB97XD_321G_spectra.tar.gz
tar -xzf bigQM7w_TDwB97XD_def2SVP_spectra.tar.gz
tar -xzf bigQM7w_TDwB97XD_def2TZVP_spectra.tar.gz
Each folder contains 12880 zipped files. Unzip these, for example, as follows
cd bigQM7w_ZINDO_spectra
bunzip2 -f bigQM7w_ZINDO_000001.dat.bz2
...
bunzip2 -f bigQM7w_ZINDO_012880.dat.bz2
cd bigQM7w_TDwB97XD_def2TZVP_spectra
bunzip2 -f bigQM7w_TDwB97XD_def2TZVP_000001.dat.bz2
...
bunzip2 -f bigQM7w_TDwB97XD_def2TZVP_012880.dat.bz2
You can access a data-mining platform to query this dataset at https://moldis.tifrh.res.in/datasets.html.
https://dx.doi.org/10.17172/NOMAD/2021.09.30-1
To learn about the machine learning model for reconstruction of TDωB97XD/def2SVPD@ωB97XD/def2SVP-level electronic spectra, please see the material collected at https://github.com/moldis-group/bigQM7w/tree/main/ML_spectrum
15 September 2021: First upload
27 October 2021: Mulliken and APT charges uploaded
30 March 2022: Data-mining platform at MolDis announced
03 April 2022: ML model for full-spectrum reconstruction made available
[Ref-1] The Resolution-vs.-Accuracy Dilemma in Machine Learning Modeling of Electronic Excitation Spectra
Prakriti Kayastha, Sabyasachi Chakraborty, Raghunathan Ramakrishnan
Digital Discovery, 1 (2022) 689-702.
DOI: https://doi.org/10.1039/D1DD00031D