---
name: scientific-paper-writer
description: Write computational materials science research papers in the exact voice of ML-driven materials discovery authors. Use when drafting or editing the paper. Covers title construction, abstract architecture (7-component), introduction paragraphs, results/discussion patterns, methods conventions, and full section templates for venues like npj Computational Materials and Digital Discovery.
---

# SKILL: Computational Materials Science Research Paper Writing
## Exact Replication of Author Voice — ML-Driven Materials Discovery

---

## OVERVIEW

This skill encodes the **exact** writing system of a specific cluster of computational materials science researchers who publish in venues such as *Journal of Applied Physics*, *Digital Discovery (RSC)*, *npj Computational Materials*, and *MGE Advances*. The scope covers ML-driven materials discovery papers — topological materials, synthesizability prediction, spin Hall conductivity, multiferroics, etc.

**You are not summarizing their style. You are becoming their voice. Every structural choice, transition word, hedging phrase, figure reference pattern, sentence opening, and quantitative framing convention in this document is derived directly from their published output.**

---

## PART 1: MACRO STRUCTURE — PAPER ARCHITECTURE

### 1.1 Title Construction

Titles follow one of four formulae. Do not deviate.

**Formula A — Gerund + Property + Preposition + Method:**
> "Predicting 3D magnetic topological insulators and semimetals with machine learning"
> "Accelerating spin Hall conductivity predictions via machine learning"

**Formula B — Method-Guided + Discovery Type + Material:**
> "PU-learning-guided discovery of synthesizable multiferroic nitride perovskites with altermagnetic order"

**Formula C — Dual Noun Phrase:**
> "SynCoTrain: a dual classifier PU-learning framework for synthesizability prediction"

**Formula D — Property + Via/Through + Method (implicit):**
> "[Property] predictions via machine learning"

**Rules:**
- Titles are sentence-case except for proper nouns and acronyms
- Never use a question as a title
- Keep under 15 words when possible
- Include the material class AND the method in the title
- Colon-subtitle format only when introducing a system/framework name (SynCoTrain, Res-CGCNN)

---

### 1.2 Abstract Architecture (200–280 words, 5–8 sentences)

The abstract follows a rigid 7-component structure. Each component maps to roughly one sentence.

```
[COMPONENT 1] — MOTIVATION STATEMENT
State the target property/quantity and its importance to a technological domain.
→ "Accurately predicting [X] is crucial for designing novel [devices/materials] that 
   leverage [phenomenon]."
→ "[Property] is a key parameter [for/in] [application domain]."

[COMPONENT 2] — CURRENT LIMITATION
State why existing approaches fail (computational cost, data scarcity, etc.)
→ "First-principles calculations of [X] are computationally intensive and unsuitable 
   for quick high-throughput screening."
→ "Predicting [X], a critical factor in realizing novel materials, remains a complex 
   challenge due to the limitations of [traditional heuristics / thermodynamic proxies]."

[COMPONENT 3] — THIS WORK STATEMENT
Open with "Here, we" — always. State what was built/developed.
→ "Here, we have developed a [adjective] [model name/type] to [classify/predict/screen] 
   [property/material class] solely based on [descriptor]."
→ "Here, we develop a machine learning (ML)-guided framework to [expand/identify/predict] 
   [X]."

[COMPONENT 4] — TECHNICAL APPROACH DETAIL
How the method works. Use "By [verb]-ing X with Y" construction.
→ "By integrating [method A] with [method B], we screen [N] [compositions/materials] 
   and predict [N] [results]."
→ "This is enabled by having access to [N] instances of [data type] and incorporating 
   [architectural innovation] into the [base framework]."

[COMPONENT 5] — PERFORMANCE METRICS
Specific quantitative outcomes.
→ "We found that [model] surpasses [baseline], achieving a [metric] of [value] for 
   [task A] and [metric] of [value] for [task B]."
→ "The model demonstrates robust performance, achieving high recall on internal and 
   leave-out test sets."

[COMPONENT 6] — DISCOVERY/APPLICATION RESULT
What was found when applied to new materials.
→ "Additionally, we utilized [model] to conduct high-throughput screenings of materials 
   in [database], leading to the prediction of [N] previously unreported materials 
   displaying [property > threshold]."
→ "Further [symmetry/magnetic/stability] filtering yield[s] [N] [material type] 
   candidates."

[COMPONENT 7] — SIGNIFICANCE STATEMENT
End with a "not only...but also" or "This study represents" construction.
→ "This study represents the inaugural endeavor to construct a machine learning model 
   capable of effectively capturing the intricate nonlinear relationship between [X] 
   and [structural descriptors], serving as a useful tool for [application]."
→ "These findings not only advance the understanding of [material class] but also 
   provide[s] a validated ML-DFT framework to guide experimental efforts in realizing 
   novel functional materials."
```

---

### 1.3 Introduction Architecture (4–6 paragraphs)

**Paragraph 1 — Broad Field Motivation**
Open with the broad technological/scientific importance. Introduce the material class or phenomenon. Give 2–3 sentences of context before narrowing.

*Opening patterns:*
- "[Material class] can provide unique advantages in a variety of applications ranging from [A] to [B]."
- "[Material class] represent[s] an emerging frontier for multifunctional materials [but/and]..."
- "Spintronics is a cutting-edge field that explores [X] and their applications in materials."
- "[Property] is a foundational pillar of modern science and perhaps the driving motivation behind [field]."

*Narrowing pattern:*
- "A subset of [broad class] that have received much attention recently for [application] are [specific class]."
- "To quantify a material's capability to generate [X], [quantity] is a key parameter."
- "Well-known examples such as [material] consist of [structural description]."

**Paragraph 2 — Conventional Approaches and Their Limitations**
Describe how the field currently handles the problem, then pivot to limitations.

*Standard opening:*
- "Identifying [target] has conventionally been a computationally and labor intensive process relying heavily on first-principles approaches such as density functional theory (DFT) calculations."
- "Historically, [heuristic-based methods] have been used to assess [property]. Nevertheless, these simplified approaches have been shown to be insufficient, as [quantitative evidence]."
- "Traditional high-throughput screening typically filters candidates by thermodynamic descriptors, such as [Ehull] or [formation energy]. However, this approach fails to capture [broader synthesizability window / metastable phases / kinetic factors]."

**Paragraph 3 — Literature Survey of ML Approaches**
Survey related ML work in 3–5 sentences. Use "For instance,", "By comparison,", "A recent work by X et al. adopted..." construction. End by noting what is missing for your specific case.

*Structure:*
```
Sentence 1: ML has shown promise → cite 3–6 papers as [N–M]
Sentence 2: "For instance, [model A] was applied successfully...with an [metric] of [value]..."
Sentence 3: "By comparison, a recent work by [Author] et al. adopted [model B]..."
Sentence 4: Other approaches mentioned briefly with citations
Sentence 5: "By contrast, attempts to do the same for [magnetic/specific class] have not 
             received much attention thus far."
```

**Paragraph 4 — Gap Statement and Challenge Articulation**
Explain specifically why your case is harder than prior work.

*Patterns:*
- "In general, applying machine learning to [magnetic/nitride/specific class] materials is more challenging."
- "This challenge calls for predictive strategies beyond [stability-based filters / atomic descriptors]."
- "Unlike conventional supervised models, [your approach] directly addresses [the lack of reliable negative synthesis data / the closed-world assumption / the Fermi level sensitivity]."

**Paragraph 5 — This Work**
Open with "In this study, we extend/develop/present..." Describe the full scope.

*Template:*
```
"In this study, we extend/develop/present [method] to [goal] in [material class/system] 
(with [A = X; B = Y; C = Z]). Specifically, [detailed approach in 2–3 sentences]. 
[N] [model types] are [trained/compared/evaluated] for [N] relevant tasks 
(e.g., [task A]; [task B]; [task C]), in which [model] provides consistently better 
performance [over/than] the other[s] in validation."
```

**Paragraph 6 (Optional) — Paper Organization**
*Standard formula:*
> "The rest of this paper is organized as follows. Section II describes [methods]. Then, the results from [models] are discussed in Section III including [validation process]. Finally, a brief conclusion is provided in Section IV."

---

### 1.4 Results and Discussion Architecture

Each results subsection follows the same internal structure:

```
[Subsection heading: numbered or named]

Sentence 1: State what was done/examined in this subsection.
Sentence 2-3: Describe the data or setup briefly.
Sentence 4-5: Present the quantitative result. Reference figure/table.
Sentence 6-7: Interpret / explain the result mechanistically.
Sentence 8 (optional): Note exceptions, limitations, or surprising findings with "It is 
                        worth noting that..." or "One exception is..."
Sentence 9 (optional): Connect to broader implications or validate against prior work.
```

---

### 1.5 Conclusion/Summary Architecture (1–2 paragraphs)

**Paragraph 1 — Summary:**
```
Opening: "In conclusion/summary, [model/method/framework] [has been developed/are developed/
          is presented] [for/in] [purpose]."
Middle: Restate the 2–3 most important quantitative results.
Validation: "These [model forecasts/predictions/results] have been validated through 
             first-principles calculations."
Highlight: "Notably, we predict [X] to exhibit [the highest/a [property] of [value]] 
            [within/among] [set]."
Close: "Our work demonstrates that [model] is [capable of capturing / a useful tool for] 
        [complex relationship / efficient screening of] [materials/properties]."
```

**Optional Paragraph 2 — Outlook:**
```
"The [methodologies/framework] outlined in this work can be readily expanded to other 
material systems."
"Potential strategies to improve the model could include [incorporating X] and [prediction 
of Y]."
"The limited number of [data] also highlights the need to [retrain/extend] the [model] 
through [iterative process / active learning]."
```

---

## PART 2: SENTENCE-LEVEL PATTERNS

### 2.1 Standard Opening Constructions

Memorize these. Use them. Do not invent alternatives.

**For stating contributions:**
- `"Here, we [verb] a [adjective] [system name] [to/for] [purpose]."`
- `"Here, we have developed a residual crystal graph convolutional neural network (Res-CGCNN) deep learning model to classify and predict [X] solely based on [Y]."`
- `"In this work, we [proposed/introduce/extend] [method] for [task] using [descriptor]."`

**For stating importance:**
- `"Accurately/Efficiently predicting [X] is crucial for [designing/developing] [application]."`
- `"[X] is critical in the development of [application] due to [mechanism]."`
- `"[Material class] can provide unique advantages in a variety of applications ranging from [A] to [B]."`

**For introducing material families:**
- `"[Material class] represent[s] an emerging frontier for multifunctional materials but remain[s] synthetically elusive."`
- `"A subset of [broad class] that have received much attention recently for [application] are [specific class]."`
- `"Well-known examples such as [compound] consist of [structural/compositional description]."`

**For computational bottleneck statements:**
- `"First-principles calculations of [X] are computationally intensive and unsuitable for quick high-throughput screening."`
- `"This poses a significant challenge for [experimental measurement / conventional computation]."`
- `"Such a task usually requires an extensive use of time-intensive [method] calculations."`

**For gap statements:**
- `"By contrast, attempts to do the same for [magnetic/specialized] [materials] have not received much attention thus far."`
- `"[Specific challenge] has [conventionally/often] been a computationally and labor intensive process."`
- `"These challenges are further compounded by [secondary challenge]."`

**For presenting findings:**
- `"The [model] achieves [the highest/a] [metric] of [value]% in [classifying/predicting] [task]."`
- `"More specifically, [model] achieves [metric] in [task A] and [metric] in [task B]."`
- `"[Material] emerges as a promising candidate, exhibiting [property A], [property B], and [property C]."`
- `"Among [N candidates/materials identified], [specific example] is [found/confirmed/identified] [to have/as] [property]."`

**For validation:**
- `"[N] [materials/candidates] are [further] examined by [DFT/MLWFs/phonon calculations] [as] the total number is too large for first-principles calculations."`
- `"The agreement between [model] and [DFT] predictions is reasonably good, with a [metric] of [value] [in par with/comparable to] the [training/test] [metric]."`

---

### 2.2 Transition Phrase Inventory

**Additive (same direction):**
- `Moreover,` — use when adding a significant point that strengthens the argument
- `Furthermore,` — use when extending a claim with a new dimension
- `In addition,` — use for supplementary, parallel information
- `Additionally,` — use in abstract/conclusion for second major finding
- `Likewise,` — use for parallel structure: "Likewise, the [Y] compounds have..."
- `Similarly,` — use for structural/procedural parallels

**Focusing/Specifying:**
- `Specifically,` — introduces technical detail after a general claim
- `In particular,` — highlights one element from a set
- `More specifically,` — narrows further after `Specifically`
- `As a specific example,` — introduces a case study

**Contrasting:**
- `However,` — most common contrast marker; use freely
- `By contrast,` — for literature comparisons; "By contrast, attempts to..."
- `On the other hand,` — for within-sentence or within-paragraph contrast
- `Nevertheless,` — for concessions: "Nevertheless, the results illustrate that..."
- `Yet,` — brief, for sharp contrast
- `While` — for simultaneous contrast within a sentence

**Sequential/Causal:**
- `Subsequently,` — for temporal sequence in methodology
- `Following [X],` — "Following the identification of promising candidates, first-principles calculations..."
- `As a result,` — direct consequence
- `As such,` — logical consequence of a definition or constraint
- `Thus,` — tight logical implication
- `Therefore,` — slightly more formal implication
- `Consequently,` — for expected outcomes

**Purpose-Method:**
- `To [purpose], we [method].` — standard formula; use throughout methods
- `To this end,` — for connecting stated goal to approach
- `In pursuit of this goal,` — slightly more formal version

**Concession:**
- `Although [concession], [main claim].` — standard concession structure
- `Importantly, although [concession], [significance].`
- `While [limitation], [result/implication].`
- `despite [limitation], [finding]`

**Temporal:**
- `In the past decade or so,` — for recent-history framing
- `In recent years,` — for contemporary context
- `In more recent attempts,` — for literature contrast with older approaches
- `Historically,` — for traditional/older methods

**Evidential:**
- `As shown in Fig. X,` — leading a sentence with figure evidence
- `As listed/summarized in Table X,` — for tabular data
- `As depicted/illustrated in Fig. X,` — slight variation
- `One can observe that` — for pointing out visible patterns
- `It can be seen that` — variation of above

---

### 2.3 Hedging and Epistemic Markers

Use these to modulate certainty appropriately. Materials science papers are never overclaiming.

**Moderate certainty:**
- `indicates that`
- `suggests that`
- `implies that`
- `reveals that`
- `demonstrates that`

**Apparent/visual:**
- `appears to be`
- `seems to`
- `essentially appears to be`

**Causal uncertainty:**
- `likely due to`
- `appears to be due to`
- `This challenge appears to be due to the sensitive dependence on [X]`
- `This imbalance is a result of [X]`
- `possibly indicating`
- `possibly due to`

**Limitation framing:**
- `It is not clear whether [X] since [magnitude/accuracy] is beyond the accuracy of our numerical calculation.`
- `While there is no guarantee that [X], [observation/evidence].`
- `Although certain discrepancies between [model] and [DFT] predictions exist, [model] is still capable of [use].`
- `Note that [confirming/establishing] [X] is outside the accuracy of our numerical calculation.`

---

### 2.4 "It is worth noting" and "Notably" Patterns

These are signature constructions. Use them to flag non-obvious or important results.

- `It is worth noting that [FN outnumbers FP / the model tends to overestimate / the [dataset] was skewed toward X].`
- `Notably, [specific quantitative finding or unexpected result].`
- `Note that it is important to [include/consider/account for] [factor] in the [DFT/ML] calculation.`
- `Notably, the average AUC is [value], demonstrating strong predictive capability.`

---

### 2.5 Sentence Length and Complexity

**Long sentences (3+ clauses):**
Use for methodology and detailed results. Clauses connected by commas, semicolons, relative pronouns. Always grammatically clean — never run-ons.

Example pattern:
> "As depicted in Figure 1b, crystal graph information is first encoded and then fed into R1 convolutional layers, after which the information from the convolutions is fed into R2 residual blocks which are constructed based on convolutional layers."

**Short sentences (1 clause):**
Use for punchlines, summary statements, and conclusions. Drop after a long explanatory sequence.

Example pattern:
> "[Long explanation of training procedure.] [Long description of validation.] All seven crystal structure types are represented."

**The "Sentence + Short Follow-up" pattern:**
> "XGBoost performs the best for both accuracy and F1 score in all three classification tasks. More specifically, it achieves the highest accuracy of 95.7% in classifying nonmetals into the topologically trivial and nontrivial categories."

---

## PART 3: VOCABULARY AND DICTION

### 3.1 Preferred Verbs

| Context | Preferred Verbs |
|---------|----------------|
| Introducing contributions | develop, propose, present, demonstrate, establish |
| Discovering/finding | identify, uncover, predict, reveal, disclose |
| Applying/using | employ, utilize, leverage, exploit, adopt |
| Performing calculations | conduct, perform, carry out, execute |
| Confirming results | validate, confirm, verify, corroborate |
| Screening candidates | screen, filter, select, examine |
| Showing/displaying | show, exhibit, display, manifest, demonstrate |
| Computing | calculate, compute, evaluate, determine, estimate |
| Improving | enhance, improve, surpass, outperform |
| Constructing models | build, construct, train, fit, develop |

**Avoid:** utilize when employ suffices; leverage overuse; "get" (use obtain, yield, produce, achieve); "use" alone (prefer employ/adopt for methods); "look at" (use examine/investigate/analyze)

### 3.2 Preferred Adjectives

**For methods:**
- computationally intensive, time-intensive
- data-driven, high-throughput, first-principles
- scalable, generalizable, transferable, reliable
- semi-supervised, iterative, collaborative

**For materials:**
- topologically non-trivial, topologically trivial
- magnetically ordered, intrinsic, layered
- synthetically elusive, metastable, synthesizable
- multifunctional, multiferroic, spintronic

**For results:**
- remarkable, notable, pronounced, significant, substantial
- promising, competitive, robust, accurate
- previously unreported, hitherto-unreported
- unprecedented

### 3.3 Preferred Nouns

**Framework vocabulary:**
- framework, workflow, methodology, pipeline
- descriptor, fingerprint, feature
- candidate, compound, composition
- screening, discovery, identification

**Performance vocabulary:**
- accuracy, F1 score, recall, precision
- area under the curve (AUC), mean absolute error (MAE)
- ROC curve, confusion matrix
- training set, validation set, test set, leave-out test set

**Materials vocabulary:**
- crystal structure, space group, unit cell
- band structure, band gap, band inversion
- Fermi level, Fermi energy, chemical potential
- spin-orbit coupling, magnetic moment, exchange coupling
- formation energy, energy above hull, convex hull
- density of states (DOS), projected DOS (PDOS)

### 3.4 Constructions to ALWAYS Use

**Quantitative framing:**
- "[Value]% of the [X] values are below [threshold]" — always use "of the"
- "with an [metric] of [value]" — not "with [value] [metric]"
- "a [metric] of [value] [units]" — always spell out metric before value
- "[N] independent training sessions were carried out" — not "we trained N times"

**Parenthetical abbreviation (always on first use):**
- "density functional theory (DFT)"
- "crystal graph convolutional neural network (CGCNN)"
- "positive-unlabeled (PU) learning"
- "mean absolute error (MAE)"
- "area under the curve (AUC)"
- "spin Hall conductivity (SHC)"
Never redefine. Use abbreviation consistently after first definition.

**The "e.g." construction for lists:**
- "(e.g., [task A]; [task B]; [task C])" — use semicolons inside parenthetical lists
- "A = Ni, V, Co, or Eu; B = Bi or Sb; and C = Te or Se" — semicolons between groups, Oxford comma in last group

---

## PART 4: QUANTITATIVE AND TECHNICAL FORMATTING

### 4.1 Numbers

**Always use numerals** (not words) for:
- Any measurement: "9249 non-magnetic materials", "60 runs", "200 epochs"
- Percentages: "87.6%", "95.7%", "84.1%" — always one decimal place
- Dataset splits: "80:10:10 ratio", "8:1:1"
- Threshold values: "0.5", "0.75", "0.7"
- Physical quantities: "1000 (ℏ/e) (S/cm)", "0.30 eV", "5.54 Å"

**Scientific notation:** use for values ≥10⁴ or ≤10⁻³:
- "2.4 × 10⁹ A/V²"
- "10⁻⁵ eV per atom"

**Ranges:**
- Use "–" (en dash) for ranges: "0.5–0.75", "300–600 words"
- In text: "between [X] and [Y]" or "[X] to [Y]"

### 4.2 Units

- Always attach units immediately after value: "115.4 (ℏ/e) (S/cm)"
- For compound units: "(ℏ/e) (S/cm)", "eV/Å", "μC/cm²", "MV/cm"
- Lattice constants: use Å with the Å symbol, not "angstroms"
- Energies: eV (not electron-volts in running text)

### 4.3 Dataset Description Formula

```
"[Total N] [compound/material type] [were obtained/are incorporated] [from/into] 
[source]. [Criteria or filtering: "corresponding to", "after", "where"]. 
[N] [class A] and [N] [class B] are [selected/balanced] to train the [task] model 
(a total of [N] [materials])."
```

Example:
> "AFLOW contains 3,530,330 total compounds and the 60,839 entries corresponding to the Inorganic Crystal Structure Database (ICSD) are incorporated into our dataset."

### 4.4 Model Performance Statement Formula

```
"[Model A] [performs/provides] [the best/consistently better] [performance/results] 
[for both/over] [metric A] and [metric B] in all [N] [classification/regression] tasks. 
More specifically, it achieves [the highest/an] [metric] of [value]% in [task description]."
```

### 4.5 Validation Statement Formula

```
"Following the identification of [N] promising [topologically non-trivial/synthesizable/
high-SHC] candidates, first-principles [DFT] calculations [in/using] [software package] 
are used to determine [property]. [...] As a final step, [topological invariant/property] 
is determined with [tool] for all suspected [candidates]."
```

---

## PART 5: FIGURE AND TABLE REFERENCE CONVENTIONS

### 5.1 Figure Reference Variants — Use All of These

Never repeat the same reference construction twice in a paragraph. Rotate through:

**Sentence-leading references:**
- `As shown in Fig. X,` (most common)
- `As illustrated in Fig. X,`
- `As depicted in Fig. X,`
- `As presented in Fig. X,`
- `As listed in Table X,`
- `As summarized in Table X,`

**Figure as subject:**
- `Fig. X shows [the distribution of / the band structure of / ...]`
- `Fig. X presents [the predicted / the calculated / ...]`
- `Table X compares [the performances of / ...]`

**Parenthetical references:**
- `(see Fig. X)` — brief supplemental confirmation
- `(see Fig. X in the supplementary material)` — for supplemental
- `(see also Fig. X)` — cross-reference

**Multi-panel references:**
- `Figs. X(a) and X(b)` — for two panels
- `Figs. X(a–d)` — for a range
- `Figs. X(a,c)` — for non-contiguous panels
- `Fig. X(a–f)` — standard multi-panel band structure figure

### 5.2 Supplementary Material References

- `(see Table SI in the supplementary material)`
- `(see Fig. S1 in the supplementary material)`
- `The details of these calculations can be found [in the/in] supplementary material.`
- `Further details are provided in the [Methods section / supplementary material].`
- `[Additional information] can be found in the supplementary material.`

---

## PART 6: CITATION INTEGRATION PATTERNS

### 6.1 Named Citations (Author + Number)

For key methodological references that merit attribution:
- `"an XGBoost model was applied successfully in the search for topological materials in nonmagnetic compounds (with an F1 score of 0.8) with atomic descriptors identified by [method].[N]"`
- `"a recent work by [Author] et al.[N] adopted [method] exploiting [descriptor]"`
- `"the algorithm of PU learning was established by [Author] and [Author].[N]"`
- `"This base PU learning method with a different classifier has already been employed to predict [X].[N]"`

### 6.2 Grouped Citations (Multiple Works)

- `"machine learning (ML) models have recently shown much promise in screening candidates...[N–M]"` — en-dash range
- `"The four major sources are [A], [B], [C], and [D].[N,M,P,Q]"` — comma-separated when non-contiguous
- `"These efforts have often been focused on...[N–M]"` — for literature surveys

### 6.3 Software/Tool Citations

- `"first-principles DFT calculations in the Vienna Ab initio Simulation Package are used to determine..."` — full name, then VASP in parentheses thereafter
- `"a Hamiltonian model and maximally localized Wannier functions (MLWFs) are created with Wannier90 by matching the DFT results[N–M]"`
- `"the Z2 index or Chern number is determined with WannierTools for all suspected TIs.[N]"`

---

## PART 7: METHODS SECTION CONVENTIONS

### 7.1 Voice Rules for Methods

**Use passive for:**
- Standard computational procedures: "The calculations were carried out using..."
- Dataset construction: "The dataset was divided into a ratio of..."
- Hyperparameter descriptions: "The initial learning rate was set to..."
- Convergence criteria: "The calculations were considered converged when..."

**Use active "we" for:**
- Novel methodological choices: "We chose SchNet as our classifier..."
- Reasoning behind decisions: "We adopted CGCNN as our classifier of choice."
- Key design decisions: "To prevent overfitting, the weight decay is set to..."
- Setup description: "Here, we employ a threshold of 0.75..."

### 7.2 Hyperparameter Reporting Formula

Report ALL hyperparameters explicitly. Formula:
```
"The [model name] model[s] were trained using [N] [convolution/linear/etc.] layers. 
The most accurate model was achieved using [N1] [layer type A], [N2] [layer type B], 
and [N3] [layer type C]. The [embedding/hidden] dimension [for each node/layer] was set 
to [value] and the batch size [of data] was set to [value]. The initial learning rate 
was set to [value]. To prevent overfitting, the weight decay is set to [value] and 
the dropout is set to [value]. Each model was trained using [N] epochs."
```

### 7.3 Equation Introduction Patterns

- `"the [quantity] can be [expressed/computed/formulated] as follows:"`
- `"[property] is [computed/calculated] by the [Kubo formula/following equation]:"`
- `"[quantity] is [defined as/given by]:"`
- After equation: `"where [symbol] [denotes/represents/is] [definition], [symbol] [denotes] [definition], and [symbol] [is/denotes] [definition]."`

### 7.4 Dataset Cleansing/Balancing Description

```
"After collecting these data, some cleansing is required. It includes [eliminating 
duplicate compounds / removing outliers with energy above hull higher than [value]]. 
Then, we balance an equal number of [class A] and [class B] [compounds/materials] 
to train the [task] model (a total of [N] [materials])."
```

---

## PART 8: SPECIFIC CONSTRUCTIONS BY PAPER SECTION

### 8.1 Standard "By Contrast" Literature Comparison Block

Use this structure in the Introduction to survey and distinguish from prior work:

```
"In the case of [general field], machine learning (ML) models have recently shown 
much promise in [task], thus reducing the overall [cost/time/effort].[N–M] These efforts 
have often been focused on [general approach]. For instance, [Model A] was applied 
successfully in [task] [with [metric] of [value]] with [descriptor type].[N] By comparison, 
a recent work by [Author] et al.[N] adopted [model B] by exploiting a descriptor called 
[name].[N] Other studies were based on [alternative approach]. They include [method C] 
and [method D].[N,M] A work by [Author] et al. achieved [metric] of [value] [with/using] 
[approach],[N] whereas [Author] et al. used [alternative] to [task].[N]

By contrast, attempts to do the same for [your specific case] have not received much 
attention thus far."
```

### 8.2 Standard Material Confirmation Sequence

When validating ML predictions with DFT/experiment:

```
"[Material A] and [Material B] are found to have the lowest energy configurations in 
space group [N] and [M], respectively. Moreover, both of these materials are magnetically 
ordered in the ground state (more specifically, [magnetic type]). [...] The energy band 
diagram for [Material A] and [Material B] is shown in Figs. X(a) and X(d). The details 
of the [conduction/valence] bands near the [Γ point/Fermi level] clearly reveal 
[characteristic feature] originating from [physical mechanism]. The corresponding 
[surface state / phonon / PDOS] diagrams as plotted in Figs. X(b) and X(e) further 
indicate [topological nature / dynamic stability / electronic character]. A subsequent 
calculation of [Z2/Chern number/phonon spectrum] shows [result], confirming [conclusion]."
```

### 8.3 Standard "One Exception" Pattern

For noting materials that deviate from predictions:

```
"Of the selected, one exception is [Material], which turns out to be [actual classification] 
rather than [predicted classification]. This material is discussed separately later in 
the section in greater detail."
```

Or:

```
"All but one [material class] are shown to be [property] via the [DFT/phonon] 
calculations as summarized in the last column [of Table X]."
```

### 8.4 Standard Performance Comparison Table Description

```
"Table X compares the performances of [Model A] and [Model B] on the identical test set 
with an equal number of layers. [N] independent training sessions were carried out for 
each machine learning model. In general, [Model B] surpasses [Model A] by approximately 
[value] ([unit]), underscoring the importance of [architectural feature]."
```

### 8.5 Standard Classification Section Opening

```
"Although more accurate than [baseline], [model] displays a [MAE] exceeding [value] 
([unit]) for regression. To ensure the model's predictive capability for high-throughput 
screening of materials with [property > threshold], we constructed a classification 
model as an auxiliary guiding criterion.[N] Specifically, the model categorizes materials 
into two groups depending on whether they exceed or fall below a specific threshold value."
```

### 8.6 Standard High-Throughput Screening Application Section

```
"The developed [regression and classification] [Model] models enable us to conduct 
high-throughput screening of materials with [large X/high Y/target Z]. In pursuit 
of this goal, we screened [N] [material class] from [database/source] that are 
absent from our [training/existing] dataset. Eventually, [N] [material type] were 
obtained. The distribution of the [model]-predicted [property] for these materials 
is illustrated in Fig. X, revealing a predominance of instances with [property < 
threshold]. To validate the accuracy of the [model], we specifically chose a few 
materials, particularly focusing on those with predicted [property > threshold] 
[and additional criterion], and then performed first-principles DFT calculations 
on these materials."
```

---

## PART 9: COMMON ERROR PATTERNS TO AVOID

### 9.1 Voice/Register Errors

❌ WRONG: "We got really good results for the topological insulator classification."
✅ CORRECT: "The XGBoost model achieves an accuracy of 95.7% in classifying topologically trivial and nontrivial insulators."

❌ WRONG: "This is because the Fermi level is hard to predict."
✅ CORRECT: "This challenge appears to be due to the sensitive dependence on the Fermi level position, which is rather difficult to predict a priori."

❌ WRONG: "We tried different classifiers to see which one works best."
✅ CORRECT: "Several ML models are evaluated for both accuracy and F1 scores. These include Decision Tree, Random Forest, and XGBoost."

### 9.2 Precision Errors

❌ WRONG: "most of the data" (vague)
✅ CORRECT: "about 62% of the SHC values are below 200 (ℏ/e) (S/cm)"

❌ WRONG: "a large dataset"
✅ CORRECT: "9249 non-magnetic materials"

❌ WRONG: "good performance"
✅ CORRECT: "an area under the receiver operating characteristic curve of 0.86"

### 9.3 Structure Errors

❌ WRONG: Starting Introduction with "In this paper, we..."
✅ CORRECT: Start with field motivation, save "In this study/work, we..." for the 5th paragraph

❌ WRONG: Stating limitations before quantitative results
✅ CORRECT: State quantitative result → validate → then note limitations

❌ WRONG: Conclusion introducing new results or new material
✅ CORRECT: Conclusion restates and synthesizes; all results belong in Results section

### 9.4 Transition Errors

❌ WRONG: "Also," as a sentence-opening transition
✅ CORRECT: "In addition," or "Moreover,"

❌ WRONG: "But," at sentence opening
✅ CORRECT: "However," or "By contrast,"

❌ WRONG: "So," as logical connector
✅ CORRECT: "As a result," or "Consequently," or "Thus,"

---

## PART 10: ABSTRACT AND SECTION TEMPLATES

### 10.1 Reusable Abstract Template

```
[SENTENCE 1 — MOTIVATION]
Accurately predicting [target property] is crucial for [designing/developing/identifying] 
novel [material type/device] that leverage[s] [physical phenomenon].

[SENTENCE 2 — LIMITATION]
[First-principles/Conventional] calculations of [X] are computationally intensive and 
unsuitable for quick high-throughput screening. / Predicting [X], a critical factor 
in realizing novel materials, remains a complex challenge due to [limitation].

[SENTENCE 3 — THIS WORK]
Here, we [have developed/develop/present] a [model architecture description] to 
[classify/predict/screen/identify] [target] [solely/directly] based on [descriptor].

[SENTENCE 4 — METHOD DETAIL]
By integrating [method A] with [method B], we screen [N] [composition type] and 
predict [N] [result]. / This is enabled by having access to [N] instances of [data] 
and incorporating [innovation] into the [base framework].

[SENTENCE 5 — PERFORMANCE]
We found that [model] surpasses [baseline], achieving [metric A of value] for 
[task A] and [metric B of value] for [task B].

[SENTENCE 6 — APPLICATION]
Additionally, we utilized [model] to conduct high-throughput screenings of [N] 
[material type], leading to the prediction of [N] previously unreported materials 
displaying [property] exceeding [threshold].

[SENTENCE 7 — SIGNIFICANCE]
These findings not only advance the understanding of [material class] but also 
provide[s] a validated [ML/ML-DFT] framework to guide [experimental/computational] 
efforts in realizing novel functional materials.
```

### 10.2 Standard Methodology Subsection Template (Dataset)

```
A. Dataset Creation / Data Overview

The data used for the [model name] [models/model] are built from [N] [major] sources. 
The [N] major sources are [Source A], [Source B], [Source C], and [Source D].[ref range] 
[Source A] contains [N] total compounds and the [N] entries corresponding to [ICSD/
criteria] are incorporated into our dataset. Additional compounds are obtained from 
[Source B].[refs]

After collecting these data, some cleansing is required. It includes [eliminating 
duplicate compounds / removing outliers]. Then, we balance an equal number of 
[class A] and [class B] [to train/for] the [task] model (a total of [N] [materials]). 
Similarly, an equal number of [class C] and [class D] are selected for the [task B] 
model (a total of [N] [materials]).
```

### 10.3 Standard Results Opening Template (ML Performance)

```
After training, these models are validated by using a subset of the datasets. 
The accuracies and F1 scores for all [N] ML models are shown in Table [X]. 
[Best model] performs the best for both accuracy and F1 score in all [N] 
classification tasks. More specifically, it achieves the highest accuracy of [X]% 
in classifying [material type] into the [class A] and [class B] categories. The 
[model] predicts the classification based on [descriptor type], and user input of 
[additional input]. The top [N] descriptors for this model are provided in Table [Y]. 
As shown, [top descriptor] is the most effective descriptor and can explain nearly 
[X]% of the variance in the model.
```

---

## PART 11: WRITING CHECKLIST

Before finalizing any section, verify:

**Abstract:**
- [ ] Opens with importance statement (not "In this paper")
- [ ] Contains "Here, we [developed/develop/present]" exactly
- [ ] Contains specific numerical metrics (at least 2)
- [ ] Ends with "These findings not only...but also..." or "This study represents..."
- [ ] Word count 200–280

**Introduction:**
- [ ] Para 1 opens with broad field context, NOT "In this paper"
- [ ] Para 3 or 4 surveys prior ML work with named citations
- [ ] Contains "By contrast, attempts to do the same for [X] have not received much attention thus far" or equivalent gap statement
- [ ] Para 5 opens with "In this study/work, we..."
- [ ] Optional final paragraph gives paper roadmap with "The rest of this paper is organized as follows"

**Results:**
- [ ] Every quantitative claim has a specific number
- [ ] Every figure is referenced with "as shown/illustrated/depicted in Fig. X"
- [ ] At least one "It is worth noting that" or "Notably,"
- [ ] Exceptions to predictions are addressed with "one exception is" or "of the selected"
- [ ] Validation statement: "confirmed by DFT calculations" / "verified with [tool]"

**Conclusion:**
- [ ] Opens with "In conclusion/summary, [model/method] [has been/are] developed for..."
- [ ] Contains "These [results/predictions] have been validated through first-principles calculations"
- [ ] Contains "Our work demonstrates that..."
- [ ] Does NOT introduce new results
- [ ] Ends with future directions: "The methodologies outlined in this work can be readily expanded to other material systems"

**Throughout:**
- [ ] All abbreviations defined on first use in format: "full name (ABBREV)"
- [ ] Numbers always as numerals, never spelled out for measurements
- [ ] Passive voice for standard procedures
- [ ] Active "we" for novel contributions and decisions
- [ ] Units always attached to values
- [ ] Dataset splits written as "X:Y:Z ratio" or "X% training, Y% validation, Z% test"
- [ ] No informal language (no "really", "a lot", "nice", "interesting")
- [ ] No em-dash overuse (prefer commas and semicolons)
- [ ] All physical quantities have appropriate significant figures

---

## PART 12: EXAMPLE PASSAGES (USE AS DIRECT REFERENCE)

### 12.1 Example Introduction Paragraph 3 (Literature Survey)

> "In the case of nonmagnetic materials, machine learning (ML) models have recently shown much promise in screening candidates that are more likely to be topological, thus reducing the overall time required to find a given number of topological materials.[14–24] These efforts have often been focused on identifying topological properties based on atomic features or Hamiltonians. For instance, an XGBoost model was applied successfully in the search for topological materials in nonmagnetic compounds (with an F1 score of 0.8) with atomic descriptors identified by Sure-Independence Screening Sparsifying-Operator Regressor.[21] By comparison, a recent work by Xu et al.[22] adopted a convolutional neural network by exploiting a descriptor called element topogivities.[13] Other studies were based on detailed crystal structures as opposed to the atomic formula. They include Crystal Graph Neural Networks (CGNNs) and Atom Specific Persistent Homology Networks (ASPHs). A work by Rasul et al. achieved an F1 score of 0.885 with a neural net using both CGNN and ASPH,[23] whereas Hong et al. used a Crystal Diffusion Variational Autoencoder to identify topological semimetals and insulators.[24]"

### 12.2 Example Results Validation Passage

> "The first set of compounds investigated are those predicted to be insulating and topologically nontrivial (thus, TIs). As listed in Table III, these materials have structures similar to MnBi₂Te₄. For each compound, at least three variants in the crystalline symmetry are examined to confirm the lowest energy space group. Among the prospective magnetic TIs with nickel, NiBi₄Te₇ and NiBi₆Te₁₀ are found to have the lowest energy configurations in space group 164 and 166, respectively. Moreover, both of these materials are magnetically ordered in the ground state (more specifically, A-type AFMs). They also possess the effective time reversal symmetry S (= TT₁/₂) as shown in Figs. 3(a) and 3(d), where T and T₁/₂ are the time reversal and half-translation symmetries, respectively. As such, the Z₂ invariant is well-defined."

### 12.3 Example Conclusion

> "In conclusion, we have developed an improved machine learning model (i.e., the residual graph CNN) for classifying and predicting the spin Hall conductivities based on the structural and compositional information of 9249 non-magnetic materials. The developed classification and regression machine learning models enable rapid high-throughput screening of materials obtained from the MP database, resulting in the prediction of several new materials with spin Hall conductivities surpassing 1000 (ℏ/e) (S/cm). These model forecasts have been validated through first-principles calculations. Notably, we predict Ta₃P to exhibit the highest SHC of 1588 (ℏ/e) (S/cm) within the Ta₃X family. Our work demonstrates that the highly descriptive Res-CGCNN framework is capable of capturing the complex nonlinear relationship between SHCs and crystal structure as well as composition, offering a useful tool for the efficient screening of materials processing high SHCs."

### 12.4 Example Methods (DFT Details)

> "All density functional theory (DFT) calculations were conducted using the Vienna Ab-initio Simulation Package (VASP).[ref] We employed the projected augmented wave method to accurately describe the electron-ion interactions,[ref] in conjunction with the Perdew–Burke–Ernzerhof (PBE) functional for the exchange-correlation energy, providing a robust framework for our simulations.[ref] A moderate plane-wave cutoff energy of 400 eV was chosen to balance computational efficiency and accuracy. For the Brillouin zone integration during structural optimization, a dense 4 × 4 × 3 k-point mesh was utilized to ensure precise results.[ref] The calculations were considered converged when the energy and force thresholds reached stringent limits of 10⁻⁵ eV per atom and 0.01 eV/Å, respectively."

---

## QUICK REFERENCE — TOP 20 SIGNATURE PHRASES

1. `"Here, we [have] developed a [X] to [Y] solely based on [Z]."`
2. `"[X] is computationally intensive and unsuitable for quick high-throughput screening."`
3. `"By contrast, attempts to do the same for [X] have not received much attention thus far."`
4. `"[Model] achieves the highest [accuracy/F1] of [X]% in [task]."`
5. `"It is worth noting that [non-obvious result]."`
6. `"Notably, [surprising/important quantitative finding]."`
7. `"Nevertheless, the results illustrate that [main finding]."`
8. `"This challenge appears to be due to [mechanism], which is rather difficult to predict a priori."`
9. `"The [descriptor] is the most effective [descriptor/feature] and can explain nearly [X]% of the variance in the model."`
10. `"These model forecasts have been validated through first-principles calculations."`
11. `"As a specific example, our investigation focuses on [X] (with [A = ...; B = ...; C = ...])."`
12. `"Following the identification of [N] promising [X] candidates, first-principles DFT calculations are used to determine [property]."`
13. `"More specifically, [model] achieves [metric] in [task]."`
14. `"[Material] emerges as a promising candidate, exhibiting [A], [B], and [C]."`
15. `"Of the [N] [candidates] examined, [N] are found to [have/be/exhibit] [property]."`
16. `"These findings not only advance the understanding of [X] but also provide[s] a validated [framework] to guide [experimental/computational] efforts."`
17. `"In this study, we extend the ML approaches to a comprehensive search of [X] in [material system]."`
18. `"The rest of this paper is organized as follows. Section [X] describes [Y]. Then, [results] are discussed in Section [Z]. Finally, a brief conclusion is provided in Section [N]."`
19. `"The methodologies outlined in this work can be readily expanded to other material systems."`
20. `"One can observe that [visual pattern in figure/data]."`