Reviewing the AI-generated submissions with AI

2026-04-21 1776 words 9 minutes

Contents

Reviewing the AI-generated submissions with AI

I am a reviewer for several conferences, and you won’t be surprised to hear we are getting an increasing number of AI-generated submissions with close to 0 technical value.

I don’t know yet if those are (1) pure spam, or (2) naive authors who hope to get selected.

Why should I spend human brains and time on something which was generated in seconds by AI? I decided to create an agent that would rule those out.

Yes, I am still reviewing submissions with my own brains. I am just asking the agent to do some initial triage. I still have to verify classification of my agent. But it’s quicker that way (see results section).

Confidentiality

Conference submissions need to be treated confidentially.

I use and recommend a local LLM for reviewing, so that the submission does not leave my host. In my case, I used Qwen 3.6 served by LM Studio.
Images of this blog post have been censored with any data that might identify the submission or the conference.

In some cases - and in accordance with conference’s policy - submissions do not contain confidential information (e.g it’s a re-submission, or there’s already a public full paper etc). Free LLMs, such as MiniMax M2.5 (from OpenCode), can then be used too.

Setup

I have OpenCode + a submission downloader script + a reviever agent + a review command.

Downloader script

First, I launch my downloader script. This one needs to be tailored to each conference, and with your own credentials. But once you have a script for EasyChair and for Pretalx, they can be re-used for many conferences. My script downloads each submission separately in ./submissions, as a markdown file:

1
2
3
4
5
6
7
8
9
# Super security system

**Track:** Wonderfull Tools
**Technical level:** general audience
**Presented before:** This talk has been also submitted to XXX

---

Everyday, we have to cope with insecurity and ... blah

Bonus. The downloader script was generated by AI 😉

Agent

With OpenCode, put your agents in ~/.config/opencode/agents. This is my reviewer.md.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
---
description: Reviewing technical research articles for security & hacking conferences
mode: primary
temperature: 0.1
tools:
  write: true
  edit: true
  bash: false

---

The temperature is intentionally low, because I’m looking for something rational, not creative.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# Reviewing articles

You are a skilled security researcher, and a conference has asked you to review several articles and give your opinion on them.

## Mission

- Identify the most valuable submissions
- Rule out submissions which obviously bad: too short, AI-generated, off-topic, content errors, marketing oriented.
- Give a score from 0 (bad) to 5 (wonderful).
- Produce a table with title + score, in markdown in `./report/README.md`
- Do not rate more than 15 submissions >=4 .
- Produce a report in `./report` with: a score for each submission + 5 line max explanation, for each submission with a score >=3 .
- Give progress to user regularly, with title + score of submissions you reviewed.

The scoring may depend from one conference to another. For this conference, I just need one grade between 0 and 5.

1
2
3
4
5
## Guidelines

1. Read the abstract:

- If the English is very broken, or with many spelling mistakes, rule out. If there's an occasional typo, disregard the issue.

Initially, this rule was very strict on English. I tried it on a personal submission of mine, and it gave it a very bad score because of a single typo 😭 ! So, I smoothened the guideline.

1
2
3
4
- Systematically downgrade any submission which create attacks without providing a defense.
- If there are several submissions on very similar topics, say so in your report.y
- Give a small bonus to submissions which are very different from others, very novative.
- Any submission that tries to "sell" something, or claims too strong that their solution is good should get a bad score

This is what I personally look for. ⚠️ It’s personal, you’ll probably want to customize this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
2. Search Internet:

- Do you find github repositories on the same topic that support the submission's view?
- Has this talk with the same title (or nearly) already been presented? If so, downgrade.


3. Rate the submission

- Do not hesitate to rate 0 if it falls in the ruled out categories
- A good but uncertain submission will typically receive a rating of 2 or 3.
- 4 and 5 are reserved to submissions which are excellent and will most certainly be selected.
- Update `./report/README.md`

I insisted on the value of grades, because I noticed the LLM was usually too kind 🎓

The last part instructs my agent to produce a report. I don’t want to lose time on bad submissions, so for those, I don’t generate any report apart from their score.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
4. Report

- Do not provide any report if score < 3

Create: `./report/<submission>.md`:

  - 5 lines max
  - Explain what you like best about this submission, and what you have doubts about.
  - Summarize the content in 1-2 lines.
  - Do not wait to review all submissions, you must regularly create and edit reports in `./report`.

During review, I found the summary of the submission particularly useful, including AI’s opinion. Usually, I began reading AI’s report, and then, I’d head to reading the entire review. It helped a bit read faster.

The following guideline for the agent is important when there are many submissions: I want to start working on a few submissions before it finishes processing them all. In that case, it’s important to have intermediate steps.

1
2
3
4
5
5. Notify

- Tell user you have processed <submission> and give the score next to it.
- Do not wait till you have reviewed all submissions to give progress!
- Do not repeat the content of the report, user will read it, so don't waste tokens.

Finally, my agent ends with my personal preferences. This helps the AI highlight those I will probably like.

1
2
3
4
5
6
7
## Preferences

- I am strongly interested in XXX
- I am not interested in YYY
- I like presentations with demos
- I like if abstracts provide proof / references for their claims
- I prefer technical talks over generic ones

Results

Ruling out unworthy submissions

This type of result is very valuable to a reviewer. You still need to check manually, but usually, I just opened the submission, read 3 lines, or very quick read.

All comments “generic”, “heavy marketing”, “old stuff” were correct. The only things I fixed was sometimes the grading: sometimes 0 was too harsh and was worth 1, same sometimes 1 was worth 2.

The good submissions

My agent didn’t rate any submission as 5. This should probably be fixed, because there were a couple of excellent submissions I uprated to 5/5. The issue probably lies in the agent with this guideline 4 and 5 are reserved to submissions which are excellent and will most certainly be selected.

The advice of AI was less accurate (IMHO) for this. In those 10 submissions, I changed the score for 6 papers. 3 were substantially over-rated and received only the grade of 2. One to 3. Two papers were moved up to score 5.

Agent’s rating for “good” submissions is too imperfect to be reliable.

Reports for good submissions

My agent was configured to generate a short report for each submissions rated 4 or 5. This very short report was useful for my review because it highlights important points I should keep in mind when I do my own review. Also, it potentially explains why the agents graded the submission as such, and can help fix the guidelines.

Reports were useful to highlight good/bad parts

Commands

The agent is meant to be run once on all submissions, and then you work on the various reports. In some cases, you want to review again a specific submission, with some more specific guidelines. For this, I created a command /review that I can invoke from OpenCode.

The command is located in ~/.config/opencode/commands.

The permissions were fixed to avoid asking obvious permissions. The agent name ensures that OpenCode automatically picks up my reviewer agent when I select this command.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
description: Perform quick review on a submission
agent: reviewer
permission:
  read:
    "submissions/*": allow
  write: deny
  edit: deny
  bash:
    "*" : deny
    "grep submissions/*": allow
    "ls submissions/*": allow
  glob:
    "submissions/*": allow
  webfetch: deny
  
---

You are a technical reviewer for leading security conferences.

Your task is to review a submission: $ARGUMENTS.

You are not allowed to go on Internet for information.
I don't want to see your intermediat reasoning.
Tokens have a cost, so make everything compact.

1. Read the article

- Search for it in `./submissions/<TITLE>.md`.
- If a file is supplied as argument, read that file.

2. Create a concise review

Use this exact format:

- Rating: <0-5>
- Summary: <50 words max that explain the submission>
- Reason: <50 words max that justify your rating with a couple of bullet points. Below 4, only negative points. 4 and above: only positive points>

3. Check:

- Check your review complies exactly to expected format in (2).
- Adjust.

The final part, “Check”, was important because otherwise the LLM kept giving me longer reports than desired.

I tend to regard the rating as only “informational” and it happens frequently that I change it. But the summary and the reasons indicated by the LLM are useful for my own review.

Conclusion

My agent is efficient at spotting the very weak submissions. I had 0 error on that account (but I’d still recommend to do a quick review of each weak submissions, just to make sure it’s not an error).
My agent wasn’t so good at rating submissions between 2 and 5. I regularly had to fix the grade.
The short reports for each submission were always very useful, as basis for my own review. I haven’t measured precisely, but I think I went faster because of them.

– Cryptax